Guides: SPSS Statistical Software: Inferential Statistics

Inferential Statistics Overview

While descriptive statistics summarize the characteristics of a dataset, inferential statistics allow you to make conclusions and predictions based on the data. Descriptive statistics allow you to describe a dataset, while inferential statistics allow you to make inferences based on a dataset.

When you have collected data from a sample, you can use inferential statistics to understand the larger population from which the sample is taken. Generally with research, it is not possible for us to collect data from every individual in the world, so we collect data from a sample of individuals and then use our results from our sample to make predictions about what we would expect to see for all individuals based on our results. However, unlike this oversimplified explanation, you likely have specific groups of individuals (i.e., specific populations) you are interested in researching (for example: all women over the age of 35, all employed adults in the US, all Christian families in Texas, all pediatric doctors in Mexico, etc.) so your population would be all individuals who fit your criteria of interest, not just all individuals in general in the entire world.

Inferential statistics have 2 main uses:

Making estimates about populations of interest (for example, using the average SAT score of a sample of 11th graders to infer what the average SAT score of all 11th graders in the US might be).
Testing hypotheses to draw conclusions about populations (for example, the relationship between SAT scores and family income).

(Information adapted from Bhandari, P. (2023). Inferential Statistics | An Easy Introduction & Examples. Scribbr).

Correlations

Correlations are a simple statistical analysis that are great for helping you predict values of one variable based on another.

Correlations are a measure of how strongly 2 variables are related to each other. The number you will see in a correlation analysis will represent the strength of the relationship between the 2 variables. Correlations range from -1 to +1, and values closer to either -1 or +1 signify stronger relationships. A correlation of 0 means no relationship.

Positive correlations mean as one of the variables increases, so does the other. Negative correlations means as one variable increases, the other decreases.

Let’s run a correlation!

Background Info on the Pearson Correlation

As a note, the Pearson bivariate correlation (bivariate just means 2 variables) is the most common type of correlation you will come across in research, though you will often just see it simply referred to as a "correlation" or "correlational analysis." There are also other types of correlations, such as the Spearman rank-order correlation, however, for most intents and purposes with quantitative data, the Pearson correlation is the one you will likely use. This is because the Pearson correlational analysis is for Scale data, whereas the Spearman rank order is for Ordinal (rank-ordered) data. It is not advised to run correlations on Nominal data, it will not give meaningful results as the numeric values of Nominal variables just represent categories.

As another note, correlations only measure linear relationships, not parabolic, cubic, or other non-linear relationships. So even though your correlational analysis may not be statistically significant, it is possible that the variables you are looking at relate to each other in some other way. (Two variables can be perfectly related, but if the relationship is not linear, a correlation coefficient is not an appropriate statistic for measuring their association).

Running a Pearson Correlation

Click on Analyze on the menubar, then select Correlate, then select Bivariate.
Let's use the the Age and Salary variables for this example. In the popup window, move Age and Salary over to the Variables box. (You can select more than two variables, but we will just use two for now). You'll notice that Pearson is selected by default for the Correlation Coefficient, this is what we want. Leave the Test of Significance as Two-Tailed (I'll explain down below the difference). Optional: you can check the box for Show only the lower triangle if you don't want the mirrored-image upper triangle to display. (I'll show the difference down below). Click OK to run the analysis.
Now you will see the Correlational analysis appear in your Output window. You should see the header Correlations and one table below it also labelled Correlations. It should look like this:
How to interpret the Correlation table:
1. You will see each variable you selected listed on the left side of the table and at the top. This is because we interpret correlations by looking at the intersection of separate variables in the table.
2. Look at the Age column and its intersection with the Salary row. We see 3 small rows labelled Pearson Correlation, Sig. (2-tailed), and N.
  1. The Pearson Correlation row shows us the Pearson correlation coefficient for the linear relationship between Age and Salary. Correlations are reported with the letter r. There is a moderately strong, positive correlation between Age and Salary (r = .762). (Remember, the closer a correlation coefficient is to -1 or +1, the stronger the relationship between the 2 variables).
  2. The Sig. (2-tailed) row shows us the p-value (or significance value) of this correlation coefficient. For Age and Salary, we have a p-value of less than .001. We report this as: p < .001. Altogether with correlation coefficient, you would report it as: There is a moderately strong, positive correlation between Age and Salary (r = .762, p < .001). Note: p-values less than .05 are generally considered statistically significant, meaning that we are fairly certain that the 2 variables are actually related and this result wasn't just due to random chance. A p-value of .05 roughly translates to there being a 5% chance that the results are due to random chance; you want lower p-values because lower p-values indicate that there is an even smaller chance that the results are due to random chance. For example a p-value of .01 translates to there being a 1% chance that the results are due to random chance.
  3. The N row shows us the sample size for the two variables in this correlational analysis. In our example, we see that 50 individuals provided data for both Age and Salary.

Another Pearson Correlation Example

As another example, let's run a correlation with 4 variables: Age, Salary, Years Employed, and Anxiety 1. Follow the steps above, but this time in Step 2, move Age, Salary, Years Employed, and Anxiety 1 over to the Variables box. Then click OK to run the analysis. See the resulting output table below. (You can add as many variables as you want to a correlational analysis, but keep in mind that the resulting correlation table will get increasingly larger with the more variables you add, and it may consequently become more difficult for you to read the table accurately).

Additional Options: Show Only the Lower Triangle

When you check the box for Show only the lower triangle, the resulting correlation table will not display the mirrored-image, identical values in the upper portion of the table. You can see what this looks like for our Age and Salary correlation table below. Now we see blank cells directly underneath the Salary column where it intersects with the Age row because these cells would contain the exact same values as the cells in the Salary row where it intersects with the Age column. These repeated/mirrored values in the upper "triangle" of the table are not shown.

The effect of showing only the lower triangle becomes even more apparent in larger correlation tables with more variables; take a look at the correlation table below with Age, Salary, Years Employed, and Anxiety 1. (The top table does not have the box checked for showing only the lower triangle. The bottom table does have the box checked).

Notice how much easier it is to read the table when we check the box for Show only the lower triangle. As the upper triangle just repeats/mirrors the values in the lower triangle, you do not need to have the upper triangle visible to be able to interpret your correlation results.

Additional Options: One-Tailed vs Two-Tailed

Generally, you’ll want to use the Two-Tailed test of significance (also called a two-tailed p-value) because a two-tailed test will test for any relationship between the 2 variables. A One-Tailed test only tests for one specific direction (either positive or negative, but not both), and you would have had to make a hypothesis about the specific direction you expected to see prior to running the analysis in order to use a one-tailed test of significance (i.e., a one-tailed p-value). A two-tailed test tests for both positive or negative relationships, so if you don’t know how the variables may relate and just want to know if they relate, use the two-tailed test.

When in doubt, it is almost always more appropriate to use a two-tailed test. A one-tailed test is only justified if you have a specific prediction (hypothesis) about the direction of the difference (e.g., Age being positively correlated with Salary), and you are completely uninterested in the possibility that the opposite outcome could be true (e.g., Age being negatively correlated with Salary).

Additional Options: Style - Highlighting Significant Correlations

Another useful option/setting that you can play around with is the Style settings of the correlation table.

Click on the Style button within the Correlations popup window to open the Style settings window.
This brings up the Table Style editor window. Click in the cell under Value, then click the dropdown arrow that appears. Select Significance.
Once you select Significance, information will populate in the Dimension, Condition, and Format columns. The information that populates by default specifies that for significance values (p-values) less than or equal to .05, SPSS will format those cells to be bright yellow. That is, all significant correlations will be highlighted in yellow. (You can adjust the Condition to other p-values if you would like; for example you could change it to be .01 so only correlations with significance values less than or equal to .01 are highlighted. You can also adjust the Format to other colors instead of yellow). For our purposes, the default conditions of .05 and yellow are fine, so let's just click Continue to save these Style settings.
Then click OK on the initial Correlation popup window to run the analysis. Now you'll see the following output with our significant correlations highlighted. (Note: I checked the box for Show only the lower triangle).

Note: We selected Significance when we were adjusting the Style settings, but you could instead select Correlation and set a specific value of correlation coefficient for SPSS to then highlight in your table. You can set the condition to specify highlighting the cells that have a correlation coefficient equal to or higher than your specified value. Or you could have both a Significance condition and a Correlation condition - if you click Add in the Style settings window, you can add multiple conditions.

t-Tests

What is a t-Test?

A t-test is a statistical test that is used to compare the means of two groups. It is often used in hypothesis testing to determine whether an intervention or treatment actually has an effect on the people within the study, or whether two groups are different from one another.(https://www.scribbr.com/statistics/t-test/)

If the groups come from a single population (so if you’re measuring before and after an experimental treatment), conduct a paired-samples t-test. This is a within-subjects design.

If the groups come from a 2 different populations (for example, men and women, or people from two separate cities, or students from two different schools), conduct an independent-samples t-test. This is a between-subjects design.

One-sample t-test is for comparing one group to a standard value or norm (like comparing acidity of a liquid to a neutral pH of 7).

Paired Samples t-Test

Let's go over how to conduct a paired samples t-test. We'll use the variables Time Task 1 and Time Task 2 in this analysis to see if our sample's times for finishing the running race improved from the first time they ran the race (Time 1) to the second time (Time 2). We are using a paired samples t-test for this analysis because we are examining the same individuals across the two time points.

Click on Analyze on the menubar, then select Compare Means and Proportions, then select Paired Samples T Test.
In the popup window, move over Time Task 1 and Time Task 2 (specifically in that order) to the Paired Variables box.
Click OK to run the analysis. You should see the following output:
Let's go over how to interpret the output tables:
1. Paired Samples Statistics Table: This table shows you descriptive statistics for the variables you selected. We can see the Mean times (in minutes) for completion of Time Task 1 (88.20 minutes) and Time Task 2 (54.60 minutes). We can also see the sample size N for each variable (50 individuals for both variables). We can also see each variable's standard deviation and standard error mean. (The standard error mean is a measure of how different the population mean would likely be from the sample mean. It tells you how much the sample mean would vary if you were to repeat the study using new samples from within the same population).
2. Paired Samples Correlations Table: This table shows you the Pearson correlation coefficient for the pair of variables you selected, so we can see how strongly the two variables relate. In our example, we see that Time Task 1 and Time Task 2 have a moderately strong positive relationship with a correlation coefficient of .788, and this correlation is statistically significant (p < .001).
3. Paired Samples Test Table: This is the actual t-test analysis of your variables.
  1. The Mean here is the average difference between the 2 variables (if you take the mean values from the top table and calculate the difference (88.2 minus 54.6), it should equal the value in this table 33.6).
  2. The Std. Deviation is the standard deviation of the differences between the paired observations from each of the 2 variables.
  3. The Std. Error Mean here is the standard error of the mean difference, which essentially represents the estimated variability of the average difference between paired observations in your sample. It's calculated by dividing the standard deviation of the differences by the square root of the sample size (in our example, it would be 34.759 divided by the square root of 50). A smaller standard error of the mean indicates greater confidence in the observed mean difference.
  4. The 95% Confidence Interval of the Difference, Lower and Upper values are showing you a range of values within which we are confident (in this case, 95% confident) that the true population mean difference between paired observations lies, based on our sample data. Essentially, it indicates the likely range for the average difference between the two paired measurements with a certain level of confidence (like 95%) based on the study results.
  5. The t value is represents the t-statistic, which is a calculated value used to determine if there is a statistically significant difference between the means of two paired groups, essentially measuring how many standard errors the observed mean difference is away from the null hypothesis mean (usually zero). A larger absolute-value of t indicates a greater difference between the paired groups, making it more likely to reject the null hypothesis.
  6. The df value represents the degrees of freedom of this analysis. It is calculated by subtracting 1 from the sample size (in our example, 50 - 1 = 49). (Degrees of freedom refers to the number of independent pieces of information used to calculate a statistic, essentially representing how many values in a data set are free to vary when estimating a population parameter).
  7. Significance. There are two significance values listed: the One-Sided p and the Two-Sided p. Which one should you use? Generally, you’ll want to use the Two-Sided p-value, as a two-sided test will test for any difference between the 2 variables. A One-Sided test only tests for one specific direction (either positive or negative, but not both), and you would have had to make a hypothesis about the specific direction you expected in order to use this p-value. A two-sided test tests for both positive or negative differences, so if you don’t know how the variables may differ and just want to know if they differ, use the two-sided p-value. When in doubt, it is almost always more appropriate to use a two-tailed test. A one-tailed test is only justified if you have a specific prediction (hypothesis) about the direction of the difference (e.g., Group A scoring higher than Group B), and you are completely uninterested in the possibility that the opposite outcome could be true (e.g., Group A scoring lower than Group B). A one-tailed test looks for an “increase” or “decrease” in the parameter whereas a two-tailed test looks for a “change” (could be either an increase or decrease) in the parameter.
Paired Samples Effect Sizes Table:

Also known as a Two Samples t-Test, an independent t-Test is

ANOVA

Regression

Multivariate Analyses

Multivariate analyses consist of analyzing more than one dependent variable at once. This is in contrast to all of the above Univariate analyses which only include one dependent variable per analysis.

To conduct multivariate analyses in SPSS, you will often need to use the General Linear Model function.

SPSS Statistical Software

Example Datasets

Data Science Librarian