In this section, you will learn several features of SAS/INSIGHT for data manipulation. If you do not already have SASDATA.BASEBALL opened in SAS/INSIGHT, do so now.
You can easily change the order in which the variables appear in the data window. For example, you can move the variable SALARY from its position at the far right of the baseball data set to the leftmost position. There are two ways to do this:
Sorting observations by values of a variable is easy in SAS/INSIGHT. As an example, suppose you want to sort the data according to player's salary. To do this, scroll to SALARY using the horizontal scroll bar at the bottom of the data window. Select the variable "SALARY". Now click on :Sort. The data are now in order of ascending salary. Note that the "."s in the data set stand for missing data. (You could also have done this without selecting "SALARY" first. Then a dialog box would appear and you would select "SALARY" from it.).
Sometimes you want to find observations that share some characteristic. For example, I know you all want to find all the Red Sox players in this data set. To do this, click on Edit:Observations:Find. A dialog box will appear. Select the variable TEAM from the left box, "=" from the center box, and "Bos." from the right box, then click on "OK". Now all the Red Sox players are highlighted.
You can do a bit more. By selecting :Find Next the Red Sox player closest to the top will be put at the top of the data set, and the order of observations will be maintained. By selecting :Move to First, all the Red Sox players will be moved to the top of the data set, but of course the order of the observations will be changed.
You can transform variables to create new variables in SAS/INSIGHT. For example, though there is no batting average variable in the BASEBALL data set, you can easily create one as follows (For you non-fans, batting average is the number of hits divided by the number of at bats):
Despite its appearance, SAS/INSIGHT is not a spreadsheet. However, it does have modest editing capabilities. For instance, you can easily change individual data values. Suppose we don't like Mike Schmidt's .0500 batting average and want to change it to .3500. You can do so by clicking on the cell containing his average and typing in .3500.
SAS/INSIGHT allows you to examine data that you see in graphs. As an example, create a scatterplot of SALARY versus BA for the baseball data. Choose an unusual observation and double click on it. A window will appear with the values of all variables for this observation. You can do the same for groups of observations. You can obtain the same results by single clicking on the observation(s) and choosing Edit:Observations:Examine.
Edit:Observations:Examine is also useful in examining data for observations chosen by Edit:Observations:Find. For example, you can look at the records of all Red Sox players by choosing Edit:Observations:Find, selecting the variable TEAM from the left box, "=" from the center box, and "Bos." from the right box, then clicking on "OK". Now choose Edit:Observations:Examine to get the data on all the Red Sox.
Slicing is a dynamic technique for viewing subsets of data based on a range of values for one variable. For example, to see how SALARY is related to BA and NO_RBI, create two scatterplots by selecting SALARY as the Y variable and BA and NO_RBI as the X variables.
Create a rectangular brush by clicking in the middle of the point cloud on the SALARY by BA scatter plot, holding the left mouse button down, and moving the mouse to create a rectangle. When you release the mouse button, all points in the brush are selected and will become highlighted on both graphs. Now move the brush by clicking in it and dragging. As the brush moves, different observations are selected in both graphs. Now to see how the relation between SALARY and NO_RBI changes for changing BA values, make the brush long (in the SALARY direction) and thin (in the BA direction) and move it left to right or right to left on the SALARY by BA scatter plot.
To make the effect more dramatic, choose :Observations and then drag the brush. Now only the selected observations will appear. One final feature you should be aware of that's also kind of fun is that if you release the mouse button while still dragging the brush, it will continue to move on its own.
You can assign markers to use for displaying observations in scatter plots, boxplots (which you'll learn about later) and rotating 3-D plots (for which you're on your own). The markers appear with each observation in the data window. You can assign markers for observations you select, and you can let SAS/INSIGHT assign markers automatically based on the value of a variable. You can control the size of the markers in any plot.
To see how to mark individual observations, create a scatter plot of NO_RBI versus NO_HITS. Select an observation that interests you by clicking on it. If the SAS:Tools window is not already open, Choose Edit:Windows:Tools (if you choose Edit:Windows and see a highlighted square to the left of Tools, the SAS:Tools window is already open). A SAS Tools window will appear. Click on the shape of the marker you want to denote the chosen observation. The marker will change to the shape you choose in all graphs and in the data window.
A nominal variable is a variable whose values stand for names of categories. LEAGUE, DIVISION, TEAM, and POSITION are all nominal variables. SAS/INSIGHT can assign markers based on the value of a nominal variable. Let's mark the National and American League players separately in the NO_RBI versus NO_HITS plot. To do this, select LEAGUE in the data window and click on the multiple marker button at the bottom of the SAS: Tools window.
You can also assign markers based on the value of an interval variable (i.e a variable whose values stand for numerical quantities, such as BA and NO_HITS). Let's assign markers in the NO_RBI versus NO_HITS plot based on SALARY. To do this, select SALARY in the data window and click on the multiple marker button at the bottom of the markers window. A different marker will be assigned to the players in the upper, middle and lower third of SALARY values.
You can adjust the marker size on the plot by choosing :Marker Sizes. Try a few sizes to find one you like.
If you are using a color monitor or printer, coloring the markers different colors may be a more effective strategy than changing marker shapes. (Although with the black and white printers in the stat lab, different shapes of markers show up better).
Basically, coloring observations proceeds in the same way as marking observations. The same SAS:Tools window used in marking is also used in coloring, so make sure it is open.
To see how to color individual observations, create a scatter plot of NO_RBI versus NO_HITS. Select an observation that interests you by clicking on it. From the SAS:Tools window click on the color you want to denote the chosen observation. The color will change to the shade you choose in all graphs and in the data window.
Let's color the National and American League players separately in the NO_RBI versus NO_HITS plot. To do this, select LEAGUE in the data window and click on the multiple color button (the rectangular colored button) at the bottom of the colors.
Let's assign colors in the NO_RBI versus NO_HITS plot based on SALARY. To do this, select SALARY in the data window and click on the multiple color button. A different color will be assigned to the players in the upper, middle and lower third of SALARY values.
You can adjust the range of data displayed and show subsets of the data by hiding observations. To illustrate the procedure, display the scatter plot of SALARY versus BA. We would like to investigate this relationship for each league on the same scatter plot (note that we could generate two separate scatter plots by using the variable LEAGUE as a group variable). We need to select the players from the National and American Leagues separately. A clever way to do this is to generate a bar chart of the variable LEAGUE (Bar charts are the analogue of frequency histograms for nominal data.) By clicking on the bar for the American League, all American League players are selected. Do this now.
To look at the scatterplot of SALARY versus BA for just National League players, choose Edit:Observations:Hide in Graphs.
Now look at the data window. De-select the selected observations by clicking on the upper left data cell of the data array. Notice that the previously selected observations now have no markers at all in the far left column. This says that these observations are hidden in all graphs (notice that the frequency histogram of LEAGUE has only the National League bar) .
To make the observations visible in the graphs again, first choose
Invert Selection, which de-selects all selected observations and selects all de-selected observations. Since all observations were de-selected just prior to this, all observations are now selected. If you now choose Edit:Observations:Show in Graphs, all observations will appear in the the graphs.
You can show subsets of the data by toggling the display of observations. This causes observations to be displayed only when they are selected. To illustrate this, create two scatter plots: one of SALARY versus BA, and the other of SALARY versus NO_RBI, by choosing Analyze:Scatter Plot ( Y X ), and assigning SALARY the Y role and BA and NO_RBI the X role.
You will now create a toggle on the value of LEAGUE as follows:
Both scatterplots will now display the data for the league you selected. To toggle between the two leagues, choose Edit:Observations:Invert Selection. Each time you do this the data displayed will change to the other league. By doing this quickly, you can detect differences between the leagues.
To undo the toggling, choose :Observations again. Click on an empty area of the graph window to de-select.
Introduction to SAS/EIS, which you'll use to run SAS macros (programs) for labs and specialized applications.
Introduction to SAS/INSIGHT I: Elementary Concepts. This is the minimal tutorial you should do to be familiar with the basics of SAS/INSIGHT, a graphically-oriented data analysis system