Guides: SPSS Statistical Software: Descriptive Statistics

Descriptive/Summary Statistics Overview

Descriptive statistics refers to a set of techniques used to summarize and organize data in a meaningful way. It involves calculating measures such as mean, median, mode, standard deviation, and variance, as well as creating visual representations like graphs and charts. The primary goal is to provide a clear and concise overview of the data's main characteristics without making inferences or predictions about a larger population. (summarized from the International Encyclopedia of Education (Fourth Edition), Elsevier, 2023).

Descriptive statistics are usually the first statistics you'll run on a dataset to get a general overall of the data and the distribution of the data (e.g., how spread out is the data, are there any outliers, is there any missing data). They are also sometimes referred to as summary statistics, as they allow you to summarize the data.

Let's walk through some descriptive statistics to give you an idea of what you can do with them!

Frequencies

Frequencies are a descriptive/summary statistic that counts up how often a particular response/value occurs for whichever variables you select. In more basic terms, Frequencies measure how often something happens. Frequencies are often displayed in frequency tables and used to create histograms or bar charts for better visualization of data distributions. In SPSS, Frequencies are suited to be run on Nominal and Ordinal data (i.e., categorical data or text data) and not Scale variables (due to the nature of Scale variable values not being constrained to a finite number of categories or response options).

Let's Practice Running Frequencies

Click on Analyze on the menubar, then select Descriptive Statistics, then select Frequencies.
Let's use the Gender variable for this example. In the popup window, select Gender and move it over to the Variable(s) box. (You can select more than one variable, but we will just do one for now). We don't need to adjust any of the settings, so just click OK to run the Frequency analysis.
Now you will see the Frequencies analysis appear in your Output window. You should see the header Frequencies and 2 tables below it. It should look like this:
How to interpret Frequency tables: The table labelled Statistics shows you how many individuals provided data for the variable you selected (in this example, all 50 people in the sample answered the question asking for their gender, and there is no missing data). The table labelled Gender is the actual frequency table for the Gender variable. (The label will match the name of the variable you selected). This table shows us how many individuals in our sample selected Male and how many selected Female, as well as the percentages of our sample that selected Male and Female.
Let's break down each column:
1. Frequency: the number of individuals who selected/chose the response listed in the respective row. (e.g., 24 individuals selected Male for their gender, and 26 selected Female). If there are any missing data, this row would also show the number of individuals that we are missing data from for this variable.
2. Percent: the percentage of our sample who selected/chose the response listed in the respective row, including showing the percentage for any missing data for the variable. (e.g., 48% of our sample selected Male, and 52% selected Female).
3. Valid Percent: excludes Missing Data from the calculation of the percentages. In our example we have no missing data, but if we did, the Valid Percent column would calculate the percentages based only on those who provided data for the variable. For demonstration purposes, let's say we were missing data for the Gender variable from 4 participants, here's how the output tables would now look:
  
  We can see now the difference between the Percent column and Valid Percent column in how the percentages are calculated/displayed.
4. Cumulative Percent: calculates a running-total of the percentages of each response/category and adds them up. We see 100% listed in the Female row because the 52% (for the percentage of individuals who selected Female) was added to the 48% of the individuals who selected Male. Cumulative Percent only takes into account non-missing data.

Running Frequencies on Multiple Variables

You can add more than one variable when you run Frequencies. SPSS will display separate tables for each variable you select. Let's run Frequencies on Ethnicity, Work Field, and all 5 of the Music Genres (Rock, Pop, Country, Hip Hop, Jazz). (Follow the steps above, but first click Reset in the little popup window to clear out Gender and any previous variables you had run Frequencies on. Then just select all of the new variables we want and move them over to the Variable(s) box).

Here's how the output should look:

Descriptives

The next summary statistic we’ll go over is Descriptives. Descriptive statistics give you a numeric breakdown and summary of your Scale data, so your numeric data. For example, Descriptives can show the mean, median, and mode as well as minimum and maximum values, and standard deviation (which is a measure of how the data is spread out around the mean).

You can run Descriptives on Scale variables (numeric data). You shouldn't run Descriptives on your Nominal (categorical) variables because it won’t give you a proper analysis for that type of data. (e.g., it doesn't make sense to calculate the average/mean Ethnicity and median Ethnicity, but it does make sense to calculate the average/mean Age and median Age of our sample).

Let’s run Descriptives on: Age, Years Employed, Salary, Time Task 1, Time Task 2, and all of the Anxiety, Confidence, and Depression variables.

Let's Practice Running Descriptives

Click on Analyze on the menubar, then select Descriptive Statistics, then select Descriptives.
In the popup window, select Age, Years Employed, Salary, Time Task 1, Time Task 2, and all of the Anxiety, Confidence, and Depression variables and move them over to the Variable(s) box. (Remember, if you click on one variable, hold the Shift key, and then click on another variable, it will select those variables and every variable in-between).
If you click on the Options button, you can adjust what statistics the output will display for your Descriptives analysis. The default options provide great information, but if you would like to add any additional statistics, just check the boxes for them. Click Continue when you're done.
Now back in the main popup, just click OK to run the Descriptives analysis. The results will appear in your Output window. You should see the header Descriptives and a table labelled Descriptive Statistics. It should look like this:
How to interpret the Descriptive Statistics table: Each row will correspond to each variable you selected for the analysis.
Let's break down each column:
1. N: the number of individuals who provided data for the corresponding variable (e.g., 50 individuals provided data for Age, 50 provided data for Years Employed).
2. Minimum: the minimum value provided in the data for the corresponding variable. (e.g., the minimum age of anyone in our sample is 19 years old. The minimum salary of anyone in our sample is $30,000).
3. Maximum: the maximum value provided in the data for the corresponding variable. (e.g., the maximum age of anyone in our sample is 67 years old. The minimum salary of anyone in our sample is $150,000).
4. Mean: the average value for the corresponding variable. (e.g., the average age of our sample participants is 38.32. The average time for completion of Time Task 1 was 88.20 minutes. The average score for Anxiety at Time 1 was 21.90 points).
5. Std. Deviation: the standard deviation for the corresponding variable.

The main difference you'll notice between Frequencies and Descriptives is that Descriptives all appear in one table, while Frequencies are split in separate tables based on each variable. This is because of the differing natures of Nominal data vs Scale data. You can perform the same arithmetic calculations on Scale (numeric data) because these variables all use numbers in the traditional, mathematical sense, so all of these calculations can be displayed in one table. Nominal data doesn't have a way to uniformly compare different variables because the "numeric values" for each variable just represent categories; for example, a value of 6 for Ethnicity represents White, but a value of 6 doesn't mean "more/higher" ethnicity than a value of 2 for Ethnicity. As a result, we can't run arithmetic/mathematical calculations on these kind of variables, all we can do is count up the responses.

Ordinal data can sometimes function as Scale data, so whether or not you can perform Descriptives on Ordinal data depends on what the variable specifically is. For example our Likert Numbers variable is listed as Scale, but could also be considered Ordinal due to the hierarchy/ranking of each response option for this variable. Frequencies can always be run on Ordinal data regardless (as Ordinal variables still just consist of categories, like Nominal data). However, with our Likert Numbers variables, a value of 5 does indicate more or a higher level of "agreement" than a value of 3, so arithmetic calculations do have meaning with this type of data.

Crosstabs (Crosstabulation)

Crosstabs or Crosstabulation, (also known as a contingency table) is a table showing the relationship between 2 categorical (Nominal or Ordinal) variables. While a frequency table can describe a single categorical variable, a crosstabs table can describe 2 categorical variables in one table. Crosstabs shows you the overlap of the number of individuals from your sample that fell within 2 different categories. For example, we can look at a breakdown of how many men and women fall under each Work Field.

Let's Practice Running Crosstabs

Let’s run Crosstabs on Gender and Work Field.

Click on Analyze on the menubar, then select Descriptive Statistics, then select Crosstabs.
In the popup window, select Gender and move it over to the Row(s) box. Then select Work Field and move it over to the Column(s) box.
(Note: it doesn't matter which one you put in the Row(s) box and which in the Column(s) box, it just changes the orientation of the table. I suggest putting the variable with more categories in the Row(s) box and the variable with less categories in the Column(s) box so you get a tall, narrow table instead of a short, wide table that you may need to scroll left/right to fully see).
The buttons on the right allow you to specify specific tests and options for your analysis, but you don't need to use any of those to run a basic Crosstabs. Click OK to run the Crosstabs analysis. You should now see the following output:
How to interpret the Crosstabs tables: The top table (labelled Case Processing Summary) just shows you the number of individuals in your sample that provided data for both of your specified variables and if there is any missing data. The bottom table (labelled with the names of the 2 variables you selected - in our example, Work Field * Gender Crosstabulation) is the actual crosstabs results table. Each row and column will correspond to the variables you selected for the analysis.
Let's break down how to interpret the Crosstabulation table:
1. We see Work Field on the left side of the table with each specific Work Field category listed in the rows. Gender is listed at the top of the table with the columns below representing each category of the Gender variable.
2. The cells in the intersections of each column and row represent the number of individuals in our sample who fell under both of the corresponding categories. For example, the cell that intersects BUS and Male has a value of 6 in it, indicating that 6 individuals in our sample responded that they are Male and that they work in the Business field. If we look at Female and HEALTH, we see that 8 women in our sample work in the Health field.
3. The Total column lists the total number of individuals in our sample who work in each Work Field category, regardless of their Gender. Each total value is the sum of the cells to the left of it. We see we have 10 individuals who work in the Business field, 9 who work in the Education field, 9 who work in the Government field, and so on. (This column should show us the same numbers as if we ran Frequencies on just the Work Field variable)
4. The Total row lists the total number of individuals in our sample who fall under each Gender category, regardless of their Work Field. Each total value is the sum of the cells above it. We see we have 24 Males in our sample and 26 Females. (This row should show us the same numbers as if we ran Frequencies on just the Gender variable).

Multiple Response Analysis (for "Select all that apply" Questions)

Remember how the Music Genre columns in our example data are supposed to represent the responses to a single "select all that apply" question? For a visual aid, here's what that would look like in a survey:

And remember how when we ran Frequencies on them, we got individual tables for each music genre because they are listed in the dataset as separate variables? What if we wanted a single table to show us all the results from this single question? Well, to do that, we need to create a Multiple Response Set.

Multiple Response Variable Sets in SPSS can be used when you have data from questions that allow for the selection of multiple responses (like "select all that apply" questions), and you want to analyze the responses collectively.

Data from "select all that apply" questions should display in SPSS similar to what we see in our example dataset:

Separate columns for each response option in the question
Have some sort of numeric code for whether an individual selected a specific response or not (in our example dataset, 1's are used to indicate that an individual selected that music genre, and blanks indicate that they did not select it)

Let's Practice Creating a Multiple Response Set

Click on Analyze in the menubar, then select Multiple Response, then select Define Variable Sets...
In the popup window, select all 5 music genres (Rock, Pop, Country, Hip Hop, and Jazz) and move them over to the Variables in Set box. (Note: if you do not see the music genres listed, close out of the popup window, go to Variable View of your data and change the Type of all 5 music genres to Numeric. They should be Numeric, not String, otherwise they will not show up when you try to make a Multiple Response Set).
Where it says Variables Are Coded As, select Dichotomies and type in 1 for the Counted Value. (Note: use Dichotomies and not Categories because the values were a dichotomy of either 1 or blank to indicate if a person liked a specific type of music or not).
For the Name, type in Music_Pref. For the Label, type in Music Preferences. Your window should now look like this:
Now click Add and your Multiple Response Set will be created. You will now see it listed in the box labelled Multiple Response Sets as $Music_Pref. (If you click on the $Music_Pref Set, it will show you what variables are in it). Your screen should look like this:
Click Close to close the popup window and save your Set. You have now created a Multiple Response Set that links all 5 of the music genre variables together for analyses. You won't notice any differences in your data or output window. But now we can run a couple analyses on our newly created Multiple Response Set.

Running Frequencies on a Multiple Response Set

Click on Analyze in the menubar, then select Multiple Response, now you should see the Frequencies and Crosstabs options in color (not grayed-out anymore). Since we defined a variable set, we can now run analyses on the Set. To start, let's select Frequencies.
In the popup window, select our Music Preferences set and move it over to the Table(s) for box.
Click OK and you will now see the frequency table for our Music Preferences Set (i.e., for our "select all that apply" music genre question)
How to interpret the Multiple Response Frequencies output:
1. The top table Case Summary, just shows you how many individuals in our sample provided data for this Multiple Response Set.
2. The bottom table is the actual Frequency table, and it's labelled after the name of our Multiple Response Set (in this case, $Music_Pref Frequencies). It shows a breakdown of how many individuals in our sample chose each music genre as one of their favorites along with the associated percentages.
3. Let's break down each column:
  1. N: the number of individuals who provided data for the corresponding variable (i.e., the number of individuals who selected each music genre as one of their favorites. For example, 25 individuals selected Rock as one of their favorite music genres, 16 selected Hip Hop, 14 selected Jazz, and so on). The Total at the bottom shows the total number of responses to this question. You'll notice it is greater than our sample size of 50 because many individuals selected more than one music genre.
  2. Percent: the percentage of responses in which the corresponding variable was selected (i.e., the percentage of responses that included the corresponding music genre as one of their favorites). Put another way, the Percent column shows how many times a music genre was chosen out of the total number of responses to this question. It’s the percentage out of the 90 total responses. (That is, out of our 90 responses, 27.8% of the responses included Rock as one of their favorite music genres. 17.8% of the responses included Hip Hop as one of their favorites, and so on). The Total will add up to 100% because it's based out of the number of responses (90), not the number of individuals in our sample (50).
  3. Percent of Cases: the percentage of how individuals in our sample selected a music genre as one of their favorites; it’s the percentage out of the 50 participants in our sample. (That is, out of our 50 participants, 50.0% of them (25 individuals) selected Rock as one of their favorite music genres. 32.0% of our sample (16 individuals) selected Hip Hop as one of their favorites, and so on). The Total will add up to more than 100% because it's based out of the number of individuals in our sample (50), and many of these individuals selected more than one answer. Because individuals selected more than one answer, the Total Percent of Cases will add up to more than 100%. (If everyone had only selected one response, then this Total would add up to exactly 100%).

Multiple Response Set Crosstabs

Click on Analyze in the menubar, then select Multiple Response, select Crosstabs.
In the popup window, select our Music Preferences Set and move it over to the Row(s) box.
Select a Nominal variable and move it over to the Column(s) box. (For this example, select the Ethnicity variable and move it over).
When you move over Ethnicity (or any Nominal variable to either the Column or Row box), you'll notice two question marks. This is because we need to define the range of values possible for the Nominal variable we moved over. Click the Define Ranges button to do this.
In the Define Ranges popup window, type in the Minimum and Maximum numeric values for your Nominal variable. (In our example, the minimum and maximum values for the Ethnicity variable are 1 and 6. Type those into their respective boxes). Click Continue on this little popup window. (If you had, for example, chosen Gender, the minimum and maximum values would be 1 and 2).
Now back in the main popup window, you should see the question marks replaced with the minimum and maximum values you typed in. Now click OK to run the Multiple Response Set Crosstabs analysis.
You should see the following output:
Interpreting this Multiple Response Set Crosstabs table is exactly the same as interpreting a regular Crosstabs table. (The top table just shows the summary of how many individuals in our sample provided data for both questions; the actual Crosstabs analysis is in the bottom table).
1. Because of how we set this analysis up, the rows display the variables in our Multiple Response Set (i.e., the 5 Music Genres). The columns display our chosen Nominal variable and all categories within the range you defined above (in this case, Ethnicity and all 6 of the Ethnicity categories).
2. The cells in the intersections of each column and row represent the number of individuals in our sample who fell under both of the corresponding categories. For example, the cell that intersects Rock and Asian has a value of 5 in it, indicating that 5 individuals in our sample responded that they are Asian and that they like Rock music. If we look at Hispanic or Latino and Hip Hop, we see that 4 Hispanic or Latino individuals in our sample like Hip Hop music.
3. The Total column lists the total number of individuals in our sample who selected each Music Genre as one of their favorite music genres, regardless of their Ethnicity. Each total value is the sum of the cells to the left of it. We see we have 25 individuals who like Rock music, 20 who like Pop music, 15 who like Country music, and so on. (This column should show us the same numbers as if we ran Frequencies on each of the Music Genre variables individually).
4. The Total row lists the total number of individuals in our sample who fall under each Ethnicity category, regardless of their music preferences. Each total value is the sum of the cells above it. We see we have 5 American Indian or Alaska Native individuals in our sample, we have 10 Asian individuals in our sample, and so on. (This row should show us the same numbers as if we ran Frequencies on just the Ethnicity variable).

The advantage of creating a Multiple Response Set is that you can create single output tables to analyze "select all that apply" questions or other questions that allowed individuals to select more than one response. You can run the special Multiple Response Set Frequencies and Crosstabs analyses which allow for a more complete analysis of each separate response option of a "select all that apply" (or similar question) in relation to each other. These analyses allow for a more succinct analysis of this kind of data rather than running separate analyses on each individual response option variable within your SPSS dataset and then trying to compare all of these separate output tables.

Compare Means

Compare Means is nice for examining multiple variables to see the breakdown of how groups compare on some sort of Scale variable. For example maybe you want to see salary breakdowns for gender while also taking into account work field. Let's go over how to run a Compare Means analysis.

Let's Practice Running a Compare Means Analysis

Click on Analyze on the menubar, then select Compare Means, then select Means.
In the popup window, move Salary over to the Dependent List box, and move Gender over to the Independent List box. Now, you could just stop here and click OK to run the Compare Means analysis on just Salary and Gender (to see the mean salaries broken down by Gender), but we want to add another Layer to this analysis (i.e., another independent variable to compare our dependent variable (Salary) across). So click the Next button to add another Layer (add another independent variable).
In Layer 2, move over Work Field to the Independent List. (You could add more Layers, but we will stop here for this example). Click OK to run this Compare Means analysis.
(Optional: If you click the Options button, you can adjust which metrics will be calculated in the analysis. By default, this analysis will calculate Mean, Number of Cases, and Standard Deviation, but you can add or remove different metrics for the analysis).Click Continue when you're done in this Options popup window).
After clicking OK and running your Compare Means analysis, you will see the following output:
Interpreting a Compare Means analysis with 2 Layers:
1. The top table Case Summary, just shows you how many individuals in our sample provided data for all 3 of the variables we selected.
2. The bottom table (labelled Report) is the actual Compare Means results table, It shows mean salaries broken down by each of the independent variables we selected (in this case, Gender and Work Field).
3. Let's break down each column:
  1. The far left column (in gray, labelled Gender) is our first Layer (i.e., our first independent variable that we selected). We see Gender in this first column, broken down by each of our Gender categories.
  2. The next column (in gray, labelled Work Field) is our second Layer (i.e., our 2nd independent variables that we selected). We see Work Field in this column, broken down by each Work Field category.
  3. The next column (in white, labelled Mean) displays the mean values of our dependent variable (Salary) broken down by the independent variables we selected for this analysis. The exact value shown in each cell is the mean salary for individuals in our sample who fell under the categories listed in each respective row. For example, the first cell under Mean (Salary) is 69166.67, which is the mean salary for Males who work in the Business field. In other words - within our sample, men who work in the Business field earn an average salary of $69,166.67. Another example - Females who work in the Government field earn an average salary of $88,750.00.
  4. The next column (labelled N) is the number of individuals in our sample who fell under each of the categories listed in the respective row. For example, 6 Males work in the Business field; 7 Females work in the Technology field.
  5. The final column (labelled Std. Deviation) is the standard deviation for each respective mean salary.
  6. The Total rows:
    1. You will notice in the Gender column (our Layer 1 column), a large row labelled Total. This row, which includes the 5 work fields to the right of it, shows the mean salaries for each Work Field regardless of Gender. In other words, these are the mean salaries across all individuals in our sample based solely on Work Field. For example, the mean salary for individuals in our sample who work in the Education field is $84,333.33. The mean salary for individuals who work in Business is $66,700.00. If you look at the N values to the right of each of these mean salary values, you will also see these are the total number of individuals in our sample who work in each Work Field.
    2. In the Work Field column (our Layer 2 column), you will also see rows labelled Total. These rows display the mean salary for each Gender regardless of Work Field. For example, the average salary for all Males in our sample, across all work fields, is $73,541.67. The average salary for all Females in our sample, across all work fields, is $78,769.23. You'll also notice the N value next to each of these mean salary values represents the total number of Males (24) and Females (26) in our sample. So if you're unsure what a Total value means, look to the other values in that row for clues as to what Total value you are looking at.