Tutorial #4 - Analyzing Data

This tutorial illustrates how to run a statistical procedure to analyze data contained in a data file.

  1. In the last tutorial, a file was created with data describing the population and land area of 20 countries. If you did not work through that tutorial, download and save the file countries.sf6 to your computer.
     
  2. If the data from tutorial #3 is not currently loaded in the STATGRAPHICS Databook, select File - Open - Open Data Source from the main menu. On the first dialog box, indicate that you wish to open a STATGRAPHICS data file:

On the second dialog box, select the file countries.sf6:

Then press Open to load it into the datasheet, as shown below:

  1. Let's begin by analyzing the data in the column named Density. To begin, locate the One Variable Analysis procedure. If you are using the classic menu, this procedure is located under Describe - Numeric Data. If you are using the Six Sigma menu, it is located under Analyze - Variable Data. Select the procedure from the menu to display the following dialog box:

  1. Most of the STATGRAPHICS statistical procedures begin with a data input dialog box similar to the one displayed above. Each data input dialog box contains one or more data fields in which the user specifies the data to be entered. In such fields, you may enter:

Data input dialog boxes also contain a Select field that allows users to analyze a subset of the data. Typical entries in the Select field include:

A detailed discussion of valid entries in these fields are contained in STATGRAPHICS Operators.pdf.

Enter Density in the data field and leave the Select field blank, which will analyze all of the rows in the datasheet. Press OK.

  1. STATGRAPHICS will now create an analysis window:

The window contains 4 “panes”, divided by movable splitter bars. The two panes on the left display tabular output, while the two panes on the right display graphical output. 

  1. Double-click in the bottom left pane. This will maximize the table of summary statistics:

Several interesting statistics are given in the table. Of the n = 20 countries,  population density ranges between 2.75 and 395.14 people per square kilometer. The average density is 138.19.

Beneath the table is the output of the StatAdvisor, which gives a short interpretation of the results. In this case, the StatAdvisor concentrates on the two statistics displayed in red, which measure the skewness and kurtosis in the data. As explained by the StatAdvisor, data that come from a normal or Gaussian distribution should yield standardized skewness and standardized kurtosis values between –2 and +2. In this case, both statistics are within that range, indicating that a bell-shaped normal curve is a reasonable model for the observations.

  1. Double-click on the summary statistics table again to restore the original split display. Then double-click on the bottom right pane to maximize the box-and-whisker plot:

The box-and-whisker plot, invented by John Tukey, provides a 5-number summary of a data sample. The central box covers the middle half of the data, extending from the lower quartile to the upper quartile. The lines extending above and below the box (the whiskers) show the location of the smallest and largest data values. The median of the data is indicated by the vertical line within the box, while the plus sign (+) shows the location of the sample mean. The fact that the upper whisker is somewhat longer than the lower, while the mean is somewhat greater than the median, is a sign of possible skewness in the data, although the small sample size resulted in a standardized skewness value that was not significant.

  1. Along the top of the analysis window is the analysis toolbar:

The buttons on the analysis toolbar are very important. The actions of the 7 leftmost buttons are summarized below:
 

 

Name

Function

Input dialog

Displays the data input dialog box so that the selected data column(s) may be changed.

Tables

Displays a list of other tables that may be created.

Graphs

Displays a list of other graphs that may be created.

Save results

Allows calculated statistics to be saved to columns of a datasheet.

Analysis options

Selects options that apply to all tables and graphs in the current analysis.

Pane options

Selects options that apply only to the currently maximized table or graph.

Graphics options

Allows you to change the titles, scaling, and other features of the currently maximized graph.

For example, press the leftmost button to display the original data input dialog box and change the data field to Population as shown below:

  1. Press OK to switch the analysis window from analyzing Density to analyzing Population:

The resulting box-and-whisker plot is even more skewed than before. In this case, two "far outside" points are plotted as separate point symbols, with the upper whisker extending out to the greatest of the other data values.

  1. To identify the unusual points, double-click on the box-and-whisker plot if necessary so that it is maximized. Then click on the Identify button on the analysis toolbar, select Country on the Point Identification dialog box, and press OK:

  1. Now click on each outside point in turn to display the country name in the Locate field:

Note that India has the largest population in the sample, followed by the United States.

  1. Additional tables may be created by pressing the Tables button and checking additional boxes:

Press OK to add the new tables to the analysis window.

  1. Additional graphs may be created by pressing the Graphs button and checking additional boxes:

Press OK to add the new graphs to the analysis window, which should now appear as shown below:

  1. Many of the tables and graphs have options associated with them, which can be accessed by maximizing a pane and pressing the Pane Options button . For example, double-click on the Summary Statistics table to maximize it and then press Pane Options to display a list of other statistics that can be calculated:

Click on several additional checkboxes and press OK to modify the Summary Statistics table. In some procedures, the Analysis Options button will also be available. The options on that button apply to all tables and graphs in the analysis window, not just to the current pane.

  1. To save the statistics you have calculated back to the datasheet that contains the data, press the Save Results button on the analysis toolbar and complete the dialog box and shown below:
     

Place checkmarks next to Summary Statistics and Statistic Labels. Under Datasheet, select the radio button labeled A.

  1. Press OK to have the calculated statistics saved to new columns of the current datasheet, which should then look like that shown below:

  1. To save the entire analysis on disk, select File - Save - Save StatFolio from the main menu. Select a name for the file in which the analysis will be saved:

Note that the analysis is saved in a separate file apart from the data. Files that save desired analyses are called StatFolios and have the file extension .sgp. These files contain a definition of the analyses to be performed and a pointer to the file containing the data. More than one StatFolio may analyze data in the same data file. When asked whether you wish to save the data in datasheet A, reply No unless you want to save the calculated statistics.

When you run STATGRAPHICS at a later date, you can select File - Open - Open StatFolio and reload the StatFolio that you saved. When you do, the StatFolio will automatically load the data file that it points to and reexecute the saved statistical analyses on that data. If the data file has been modified since the StatFolio was last saved, the statistics will be different. StatFolios are the primary method for saving analyses that you wish to repeat often, since they automatically recalculate the selected statistics and graphs when they are reloaded.

Return