Descriptive statistics refers to a set of techniques used to summarize and organize data in a meaningful way. It involves calculating measures such as mean, median, mode, standard deviation, and variance, as well as creating visual representations like graphs and charts. The primary goal is to provide a clear and concise overview of the data's main characteristics without making inferences or predictions about a larger population. (summarized from the International Encyclopedia of Education (Fourth Edition), Elsevier, 2023).
Descriptive statistics are usually the first statistics you'll run on a dataset to get a general overall of the data and the distribution of the data (e.g., how spread out is the data, are there any outliers, is there any missing data). They are also sometimes referred to as summary statistics, as they allow you to summarize the data.
Let's walk through some descriptive statistics to give you an idea of what you can do with them!
Frequencies are a descriptive/summary statistic that counts up how often a particular response/value occurs for whichever variables you select. In more basic terms, Frequencies measure how often something happens. Frequencies are often displayed in frequency tables and used to create histograms or bar charts for better visualization of data distributions. In SPSS, Frequencies are suited to be run on Nominal and Ordinal data (i.e., categorical data or text data) and not Scale variables (due to the nature of Scale variable values not being constrained to a finite number of categories or response options).
You can add more than one variable when you run Frequencies. SPSS will display separate tables for each variable you select. Let's run Frequencies on Ethnicity, Work Field, and all 5 of the Music Genres (Rock, Pop, Country, Hip Hop, Jazz). (Follow the steps above, but first click Reset in the little popup window to clear out Gender and any previous variables you had run Frequencies on. Then just select all of the new variables we want and move them over to the Variable(s) box).
Here's how the output should look:
The next summary statistic we’ll go over is Descriptives. Descriptive statistics give you a numeric breakdown and summary of your Scale data, so your numeric data. For example, Descriptives can show the mean, median, and mode as well as minimum and maximum values, and standard deviation (which is a measure of how the data is spread out around the mean).
You can run Descriptives on Scale variables (numeric data). You shouldn't run Descriptives on your Nominal (categorical) variables because it won’t give you a proper analysis for that type of data. (e.g., it doesn't make sense to calculate the average/mean Ethnicity and median Ethnicity, but it does make sense to calculate the average/mean Age and median Age of our sample).
Let’s run Descriptives on: Age, Years Employed, Salary, Time Task 1, Time Task 2, and all of the Anxiety, Confidence, and Depression variables.
The main difference you'll notice between Frequencies and Descriptives is that Descriptives all appear in one table, while Frequencies are split in separate tables based on each variable. This is because of the differing natures of Nominal data vs Scale data. You can perform the same arithmetic calculations on Scale (numeric data) because these variables all use numbers in the traditional, mathematical sense, so all of these calculations can be displayed in one table. Nominal data doesn't have a way to uniformly compare different variables because the "numeric values" for each variable just represent categories; for example, a value of 6 for Ethnicity represents White, but a value of 6 doesn't mean "more/higher" ethnicity than a value of 2 for Ethnicity. As a result, we can't run arithmetic/mathematical calculations on these kind of variables, all we can do is count up the responses.
Ordinal data can sometimes function as Scale data, so whether or not you can perform Descriptives on Ordinal data depends on what the variable specifically is. For example our Likert Numbers variable is listed as Scale, but could also be considered Ordinal due to the hierarchy/ranking of each response option for this variable. Frequencies can always be run on Ordinal data regardless (as Ordinal variables still just consist of categories, like Nominal data). However, with our Likert Numbers variables, a value of 5 does indicate more or a higher level of "agreement" than a value of 3, so arithmetic calculations do have meaning with this type of data.
Crosstabs or Crosstabulation, (also known as a contingency table) is a table showing the relationship between 2 categorical (Nominal or Ordinal) variables. While a frequency table can describe a single categorical variable, a crosstabs table can describe 2 categorical variables in one table. Crosstabs shows you the overlap of the number of individuals from your sample that fell within 2 different categories. For example, we can look at a breakdown of how many men and women fall under each Work Field.
Let’s run Crosstabs on Gender and Work Field.
Remember how the Music Genre columns in our example data are supposed to represent the responses to a single "select all that apply" question? For a visual aid, here's what that would look like in a survey:
And remember how when we ran Frequencies on them, we got individual tables for each music genre because they are listed in the dataset as separate variables? What if we wanted a single table to show us all the results from this single question? Well, to do that, we need to create a Multiple Response Set.
Multiple Response Variable Sets in SPSS can be used when you have data from questions that allow for the selection of multiple responses (like "select all that apply" questions), and you want to analyze the responses collectively.
Data from "select all that apply" questions should display in SPSS similar to what we see in our example dataset:
The advantage of creating a Multiple Response Set is that you can create single output tables to analyze "select all that apply" questions or other questions that allowed individuals to select more than one response. You can run the special Multiple Response Set Frequencies and Crosstabs analyses which allow for a more complete analysis of each separate response option of a "select all that apply" (or similar question) in relation to each other. These analyses allow for a more succinct analysis of this kind of data rather than running separate analyses on each individual response option variable within your SPSS dataset and then trying to compare all of these separate output tables.
Compare Means is nice for examining multiple variables to see the breakdown of how groups compare on some sort of Scale variable. For example maybe you want to see salary breakdowns for gender while also taking into account work field. Let's go over how to run a Compare Means analysis.
Copyright © Baylor® University. All rights reserved.
Report It | Title IX | Mental Health Resources | Anonymous Reporting | Legal Disclosures