An Introduction to SAS/INSIGHT II: Advanced Concepts

Manipulating Data

In this section, you will learn several features of SAS/INSIGHT for data manipulation.

Arranging Variables or Observations

You can easily change the order in which the variables appear in the data window. For example, you can move the variable SALARY from its position at the far right of the baseball data set to the leftmost position. There are two ways to do this:

  1. You may select tex2html_wrap_inline779 :Move to First which, as long as a variable is not already selected, will bring up a dialog box containing the names of all variables. Scroll down the list in this box, click on "SALARY" and then on "OK".

  2. You may first select the variable SALARY in the data window and then select tex2html_wrap_inline779 :Move to First. In this case no dialog box will appear. This also works with any pre-selected observation.

tex2html_wrap_inline779 :Move to Last will reverse this operation. These methods also work with several variables or observations: just select the desired variables or observations.

Sorting Observations

Sorting observations by values of a variable is easy in SAS/INSIGHT. As an example, suppose you want to sort the data according to player's salary. To do this, scroll to SALARY using the horizontal scroll bar at the bottom of the data window. Select the variable "SALARY". Now click on tex2html_wrap_inline779 :Sort. The data are now in order of ascending salary. Note that the "."s in the data set stand for missing data. (You could also have done this without selecting "SALARY" first. Then a dialog box would appear and you would select "SALARY" from it.).

Finding Observations

Sometimes you want to find observations that share some characteristic. For example, I know you all want to find all the Red Sox players in this data set. To do this, click on Edit:Observations:Find. A dialog box will appear. Select the variable TEAM from the left box, "=" from the center box, and "Bos." from the right box, then click on "OK". Now all the Red Sox players are highlighted.

You can do a bit more. By selecting tex2html_wrap_inline779 :Find Next the Red Sox player closest to the top will be put at the top of the data set, and the order of observations will be maintained. By selecting tex2html_wrap_inline779 :Move to First, all the Red Sox players will be moved to the top of the data set, but of course the order of the observations will be changed.

Transforming Data

  You can transform variables to create new variables in SAS/INSIGHT. For example, though there is no batting average variable in the BASEBALL data set, you can easily create one as follows (For you non-fans, batting average is the number of hits divided by the number of at bats):

  1. Choose Edit:Variables:Other. A dialog box will appear.

  2. In the box with the variables list click on NO_HITS to select it, then click on the "Y" button. "NO_HITS" should appear in the box below it.

  3. Next click on NO_ATBAT to select it, then click on the "X" button. "NO_ATBAT" should appear in the box below it.

  4. Click on the "Y/X" under "Transformation:".

  5. You can use the name SAS assigns the new variable, or you can replace it with a more meaningful one, such as BA (which is what we'll call it in the rest of this document).

  6. Now click on "OK". The variable for batting average will appear last in the data window.

Despite its appearance, SAS/INSIGHT is not a spreadsheet. However, it does have modest editing capabilities. For instance, you can easily change individual data values. Suppose we don't like Mike Schmidt's .0500 batting average and want to change it to .3500. You can do so by clicking on the cell containing his average and typing in .3500.

Examining Data

SAS/INSIGHT allows you to examine data that you see in graphs. As an example, create a scatterplot of SALARY versus BA for the baseball data. Choose an unusual observation and double click on it. A window will appear with the values of all variables for this observation. You can do the same for groups of observations. You can obtain the same results by single clicking on the observation(s) and choosing Edit:Observations:Examine.

Edit:Observations:Examine is also useful in examining data for observations chosen by Edit:Observations:Find. For example, you can look at the records of all Red Sox players by choosing Edit:Observations:Find, selecting the variable TEAM from the left box, "=" from the center box, and "Bos." from the right box, then clicking on "OK". Now choose Edit:Observations:Examine to get the data on all the Red Sox.

Slicing

Slicing is a dynamic technique for viewing subsets of data based on a range of values for one variable. For example, to see how SALARY is related to BA and NO_RBI, create two scatterplots by selecting SALARY as the Y variable and BA and NO_RBI as the X variables.

Create a rectangular brush by clicking in the middle of the point cloud on the SALARY by BA scatter plot, holding the left mouse button down, and moving the mouse to create a rectangle. When you release the mouse button, all points in the brush are selected and will become highlighted on both graphs. Now move the brush by clicking in it and dragging. As the brush moves, different observations are selected in both graphs. Now to see how the relation between SALARY and NO_RBI changes for changing BA values, make the brush long (in the SALARY direction) and thin (in the BA direction) and move it left to right or right to left on the SALARY by BA scatter plot.

To make the effect more dramatic, choose tex2html_wrap_inline779 :Observations and then drag the brush. Now only the selected observations will appear. One final feature you should be aware of that's also kind of fun is that if you release the mouse button while still dragging the brush, it will continue to move on its own.

Marking Observations

You can assign markers to use for displaying observations in scatter plots, boxplots (which you'll learn about later) and rotating 3-D plots (for which you're on your own). The markers appear with each observation in the data window. You can assign markers for observations you select, and you can let SAS/INSIGHT assign markers automatically based on the value of a variable. You can control the size of the markers in any plot.

Marking Individual Observations

To see how to mark individual observations, create a scatter plot of NO_RBI versus NO_HITS. Select an observation that interests you by clicking on it. If the SAS:Tools window is not already open, Choose Edit:Windows:Tools (if you choose Edit:Windows and see a highlighted square to the left of Tools, the SAS:Tools window is already open). A SAS Tools window will appear. Click on the shape of the marker you want to denote the chosen observation. The marker will change to the shape you choose in all graphs and in the data window.

Marking by Nominal Variable

A nominal variable is a variable whose values stand for names of categories. LEAGUE, DIVISION, TEAM, and POSITION are all nominal variables. SAS/INSIGHT can assign markers based on the value of a nominal variable. Let's mark the National and American League players separately in the NO_RBI versus NO_HITS plot. To do this, select LEAGUE in the data window and click on the multiple marker button at the bottom of the SAS: Tools window.

Marking by Interval Variable

You can also assign markers based on the value of an interval variable (i.e a variable whose values stand for numerical quantities, such as BA and NO_HITS). Let's assign markers in the NO_RBI versus NO_HITS plot based on SALARY. To do this, select SALARY in the data window and click on the multiple marker button at the bottom of the markers window. A different marker will be assigned to the players in the upper, middle and lower third of SALARY values.

Adjusting Marker Size

You can adjust the marker size on the plot by choosing tex2html_wrap_inline779 :Marker Sizes. Try a few sizes to find one you like.

Coloring Observations

If you are using a color monitor or printer, coloring the markers different colors may be a more effective strategy than changing marker shapes. (Although with the black and white printers in the stat lab, different shapes of markers show up better).

Basically, coloring observations proceeds in the same way as marking observations. The same SAS:Tools window used in marking is also used in coloring, so make sure it is open.

Coloring Individual Observations

To see how to color individual observations, create a scatter plot of NO_RBI versus NO_HITS. Select an observation that interests you by clicking on it. From the SAS:Tools window click on the color you want to denote the chosen observation. The color will change to the shade you choose in all graphs and in the data window.

Coloring by Nominal Variable

Let's color the National and American League players separately in the NO_RBI versus NO_HITS plot. To do this, select LEAGUE in the data window and click on the multiple color button (the rectangular colored button) at the bottom of the colors.

Coloring by Interval Variable

Let's assign colors in the NO_RBI versus NO_HITS plot based on SALARY. To do this, select SALARY in the data window and click on the multiple color button. A different color will be assigned to the players in the upper, middle and lower third of SALARY values.

Hiding Observations

You can adjust the range of data displayed and show subsets of the data by hiding observations. To illustrate the procedure, display the scatter plot of SALARY versus BA. We would like to investigate this relationship for each league on the same scatter plot (note that we could generate two separate scatter plots by using the variable LEAGUE as a group variable). We need to select the players from the National and American Leagues separately. A clever way to do this is to generate a bar chart of the variable LEAGUE (Bar charts are the analogue of frequency histograms for nominal data.) By clicking on the bar for the American League, all American League players are selected. Do this now.

To look at the scatterplot of SALARY versus BA for just National League players, choose Edit:Observations:Hide in Graphs.

Now look at the data window. De-select the selected observations by clicking on the upper left data cell of the data array. Notice that the previously selected observations now have no markers at all in the far left column. This says that these observations are hidden in all graphs (notice that the frequency histogram of LEAGUE has only the National League bar) .

To make the observations visible in the graphs again, first choose Edit:Observations:
Invert Selection, which de-selects all selected observations and selects all de-selected observations. Since all observations were de-selected just prior to this, all observations are now selected. If you now choose Edit:Observations:Show in Graphs, all observations will appear in the the graphs.

Toggling the Display of Observations

You can show subsets of the data by toggling the display of observations. This causes observations to be displayed only when they are selected. To illustrate this, create two scatter plots: one of SALARY versus BA, and the other of SALARY versus NO_RBI, by choosing Analyze:Scatter Plot ( Y X ), and assigning SALARY the Y role and BA and NO_RBI the X role.

You will now create a toggle on the value of LEAGUE as follows:

  1. Choose from the lower left of one of the scatterplots tex2html_wrap_inline779 :Observations. All the observation markers will disappear from the two scatter plots.

  2. Choose Edit:Observations:Find from the data window.

  3. In the dialog box, select LEAGUE from the variables list. Select the value you wish to display first: American or National League. Click on "OK".

Both scatterplots will now display the data for the league you selected. To toggle between the two leagues, choose Edit:Observations:Invert Selection. Each time you do this the data displayed will change to the other league. By doing this quickly, you can detect differences between the leagues.

To undo the toggling, choose tex2html_wrap_inline779 :Observations again. Click on an empty area of the graph window to de-select.

Where Next?

gif Intro page.

gif Unix SAS Quickstart.

gif Introduction to SAS/EIS, which you'll use to run SAS macros (programs) for labs and specialized applications.

gif Introduction to SAS/INSIGHT I: Elementary Concepts. This is the minimal tutorial you should do to be familiar with the basics of SAS/INSIGHT, a graphically-oriented data analysis system

gif Getting Started in the Statistics Multimedia Computer Classroom. (for new users)


Joe Petruccelli < jdp@wpi.edu>
Last modified: Fri Jul 27 15:09:12 EDT 2001