While descriptive statistics summarize the characteristics of a dataset, inferential statistics allow you to make conclusions and predictions based on the data. Descriptive statistics allow you to describe a dataset, while inferential statistics allow you to make inferences based on a dataset.
When you have collected data from a sample, you can use inferential statistics to understand the larger population from which the sample is taken. Generally with research, it is not possible for us to collect data from every individual in the world, so we collect data from a sample of individuals and then use our results from our sample to make predictions about what we would expect to see for all individuals based on our results. However, unlike this oversimplified explanation, you likely have specific groups of individuals (i.e., specific populations) you are interested in researching (for example: all women over the age of 35, all employed adults in the US, all Christian families in Texas, all pediatric doctors in Mexico, etc.) so your population would be all individuals who fit your criteria of interest, not just all individuals in general in the entire world.
Inferential statistics have 2 main uses:
(Information adapted from Bhandari, P. (2023). Inferential Statistics | An Easy Introduction & Examples. Scribbr).
Correlations are a simple statistical analysis that are great for helping you predict values of one variable based on another.
Correlations are a measure of how strongly 2 variables are related to each other. The number you will see in a correlation analysis will represent the strength of the relationship between the 2 variables. Correlations range from -1 to +1, and values closer to either -1 or +1 signify stronger relationships. A correlation of 0 means no relationship.
Positive correlations mean as one of the variables increases, so does the other. Negative correlations means as one variable increases, the other decreases.
Let’s run a correlation!
As a note, the Pearson bivariate correlation (bivariate just means 2 variables) is the most common type of correlation you will come across in research, though you will often just see it simply referred to as a "correlation" or "correlational analysis." There are also other types of correlations, such as the Spearman rank-order correlation, however, for most intents and purposes with quantitative data, the Pearson correlation is the one you will likely use. This is because the Pearson correlational analysis is for Scale data, whereas the Spearman rank order is for Ordinal (rank-ordered) data. It is not advised to run correlations on Nominal data, it will not give meaningful results as the numeric values of Nominal variables just represent categories.
As another note, correlations only measure linear relationships, not parabolic, cubic, or other non-linear relationships. So even though your correlational analysis may not be statistically significant, it is possible that the variables you are looking at relate to each other in some other way. (Two variables can be perfectly related, but if the relationship is not linear, a correlation coefficient is not an appropriate statistic for measuring their association).
As another example, let's run a correlation with 4 variables: Age, Salary, Years Employed, and Anxiety 1. Follow the steps above, but this time in Step 2, move Age, Salary, Years Employed, and Anxiety 1 over to the Variables box. Then click OK to run the analysis. See the resulting output table below. (You can add as many variables as you want to a correlational analysis, but keep in mind that the resulting correlation table will get increasingly larger with the more variables you add, and it may consequently become more difficult for you to read the table accurately).


When you check the box for Show only the lower triangle, the resulting correlation table will not display the mirrored-image, identical values in the upper portion of the table. You can see what this looks like for our Age and Salary correlation table below. Now we see blank cells directly underneath the Salary column where it intersects with the Age row because these cells would contain the exact same values as the cells in the Salary row where it intersects with the Age column. These repeated/mirrored values in the upper "triangle" of the table are not shown.

The effect of showing only the lower triangle becomes even more apparent in larger correlation tables with more variables; take a look at the correlation table below with Age, Salary, Years Employed, and Anxiety 1. (The top table does not have the box checked for showing only the lower triangle. The bottom table does have the box checked).


Notice how much easier it is to read the table when we check the box for Show only the lower triangle. As the upper triangle just repeats/mirrors the values in the lower triangle, you do not need to have the upper triangle visible to be able to interpret your correlation results.
Generally, you’ll want to use the Two-Tailed test of significance (also called a two-tailed p-value) because a two-tailed test will test for any relationship between the 2 variables. A One-Tailed test only tests for one specific direction (either positive or negative, but not both), and you would have had to make a hypothesis about the specific direction you expected to see prior to running the analysis in order to use a one-tailed test of significance (i.e., a one-tailed p-value). A two-tailed test tests for both positive or negative relationships, so if you don’t know how the variables may relate and just want to know if they relate, use the two-tailed test.
When in doubt, it is almost always more appropriate to use a two-tailed test. A one-tailed test is only justified if you have a specific prediction (hypothesis) about the direction of the difference (e.g., Age being positively correlated with Salary), and you are completely uninterested in the possibility that the opposite outcome could be true (e.g., Age being negatively correlated with Salary).
Another useful option/setting that you can play around with is the Style settings of the correlation table.
Note: We selected Significance when we were adjusting the Style settings, but you could instead select Correlation and set a specific value of correlation coefficient for SPSS to then highlight in your table. You can set the condition to specify highlighting the cells that have a correlation coefficient equal to or higher than your specified value. Or you could have both a Significance condition and a Correlation condition - if you click Add in the Style settings window, you can add multiple conditions.
The Chi-Square Test of Independence (a.k.a., Chi-Square Test of Association) determines whether there is an association/relationship between nominal or ordinal variables. (This is different from a Pearson correlation because a Pearson correlation is meant for testing associations/relationships between scale variables). The Chi-Square Test is an extension of the Crosstabs analysis in SPSS. While Crosstabs can show you how two nominal (or ordinal) variables compare, the Chi-Square Test can assess if these variables are significantly related or not.
Examples of when to use a Chi-Square Test of Independence:
(Not to be confused with the Chi-Square Goodness-of-Fit test, which is an entirely different test. The Chi-Square Goodness-of-Fit test is for when you want to determine if a the distribution of a single categorical variable matches a specific, known distribution, like the normal distribution. The Chi-Square Test of Independence is for testing whether two categorical variables are statistically associated, and it uses a crosstabs/contingency table to compare the two variables).
Extra note: The chi-square test of independence is a nonparametric test. A non-parametric test is a statistical test that does not assume that the data comes from a population with a specific distribution, such as the normal distribution. These tests, also known as "distribution-free" tests, are useful for data that is non-normally distributed, has small sample sizes, contains outliers, or is measured at the ordinal or nominal level. Because the chi-square test of independence is for assessing nominal and/or ordinal data, it is considered a non-parametric test.
For further info on these assumptions, check out these resources: Laerd Statistics Chi-Square Test for Association Guide, Scribbr Chi-Square Test of Independence Guide, Kent State Chi-Square Test of Independence Guide.
A t-test is a statistical test that is used to compare the means of two groups. It is often used in hypothesis testing to determine whether an intervention or treatment actually has an effect on the people within the study, or whether two groups are different from one another.(https://www.scribbr.com/statistics/t-test/)
If the groups come from a single population (e.g., pre-test and post-test data on the same individuals, or measuring the same group of subjects before and after an experimental treatment), conduct a paired-samples t-test. This is a within-subjects design.
If the groups come from a 2 different populations (for example, men and women, or people from two separate cities, or students from two different schools), conduct an independent-samples t-test. This is a between-subjects design.
One-sample t-test is for comparing one group to a standard value or norm (like comparing acidity of a liquid to a neutral pH of 7).
Let's go over how to conduct a paired samples t-test. We'll use the variables Time Task 1 and Time Task 2 in this analysis to see if our sample's times for finishing the running race improved from the first time they ran the race (Time 1) to the second time (Time 2). We are using a paired samples t-test for this analysis because we are examining the same individuals across the two time points.

Interpretation of Cohen’s d effect size (if d is negative, use its absolute value to interpret)
| Cohen's d | Interpretation |
|---|---|
| 0.2 to 0.49 | Small effect |
| 0.5 to 0.79 | Medium/moderate effect |
| 0.8 or higher | Large effect |
Note: if your Cohen's d value is near the threshold between two interpretations, for example 0.49, you could say it’s a “small-to-medium effect.” 0.79 would be considered a “medium-to-large effect.”
A t-test is a statistical test that is used to compare the means of two groups. It is often used in hypothesis testing to determine whether an intervention or treatment actually has an effect on the people within the study, or whether two groups are different from one another.(https://www.scribbr.com/statistics/t-test/)
If the groups come from a single population (e.g., pre-test and post-test data on the same individuals, or measuring the same group of subjects before and after an experimental treatment), conduct a paired-samples t-test. This is a within-subjects design.
If the groups come from a 2 different populations (for example, men and women, or people from two separate cities, or students from two different schools), conduct an independent-samples t-test (also known as a Two-Samples t-test). This is a between-subjects design.
One-sample t-test is for comparing one group to a standard value or norm (like comparing acidity of a liquid to a neutral pH of 7).
Let's go over how to conduct an independent samples t-test. We'll use the variables Gender and Anxiety Time 1 in this analysis to see if men and women differ on their anxiety levels. We are using an independent samples t-test for this analysis because we are examining unrelated groups.

Coming soon
ANOVA stands for Analysis of Variance. Recall that a t-test can only be used when comparing the means of 2 groups (a.k.a. pairwise comparison). If you want to compare the means of more than 2 groups, you conduct an ANOVA. ANOVAs are used to analyze the difference in means among 3 or more groups. There are different types of ANOVAs, scroll down to learn about the One-Way ANOVA, or click through the tabs to learn about other types of ANOVAs.
A one-way ANOVA is used to determine whether there are any statistically significant differences between the means of 3 or more independent groups. For example, you could test whether freshman (1st-year undergrad students), sophomores (2nd-year undergrads), juniors (3rd-year undergrads), and seniors (4th-year undergrads) differ in their stress levels. (For more information, see: https://www.scribbr.com/statistics/one-way-anova/)
In order to properly use a one-way ANOVA, there are some statistical assumptions that must be met. Statistical assumptions are underlying conditions that must be met for a statistical test to provide valid results. These assumptions are like rules that need to be followed to ensure the conclusions drawn from the analysis are reliable. Violating these assumptions can lead to inaccurate interpretations and flawed conclusions.
For further info on these assumptions, check out these resources: Laerd Statistics One-Way ANOVA Guide, Scribbr One-Way ANOVA Guide, Kent State One-Way ANOVA Guide.
In addition to those assumptions, there are other requirements of your data in order to conduct a one-way ANOVA:
(This section is in progress, check back for updated content)
Coming soon!
Coming Soon!
Regression is a method used to analyze the relationship between a dependent variable and one or more independent variables. It helps predict or understand how changes in the independent variable(s) affect the dependent variable. The dependent variable must be
More specifically, regression analysis seeks to find a mathematical equation (a "regression model") that describes the relationship between the inputted variables, often by finding the line (or curve) that best fits the data points. Essentially, a regression analysis aims to find a model that best fits the data, allowing for predictions and insights into the relationships between variables.
There are different types of regression analyses, scroll down to learn about linear regression, or click through the tabs to learn about other types of regression analyses.
Regression is used to estimate the relationship between one continuous/scale dependent variable and one or more independent variables, which can be continuous/scale or nominal/categorical. Simple linear regression consists of one independent variable and one dependent variable. Multiple linear regression consists of two or more independent variables and one dependent variable. Linear regression specifically finds the "line of best-fit" for the variables, a linear equation that explains how the variables relate to each other. The line and equation can be used to predict the value of the dependent variable based on differing values of the independent variable(s). For example, examining how stress levels, hours of sleep, and gender relate to test scores (For more information, see: https://www.scribbr.com/statistics/simple-linear-regression/)
In order to properly run a linear regression, there are some statistical assumptions that must be met. Statistical assumptions are underlying conditions that must be met for a statistical test to provide valid results. These assumptions are like rules that need to be followed to ensure the conclusions drawn from the analysis are reliable. Violating these assumptions can lead to inaccurate interpretations and flawed conclusions.
For further info on these assumptions, check out these resources: Laerd Statistics Linear Regression Guide, Laerd Statistics Multiple Regression Guide, Scribbr Multiple Linear Regression Guide, Scribbr Simple Linear Regression Guide, StatisticsSolutions' Multiple Linear Regression Assumptions Guide.
In addition to those assumptions, there are other requirements of your data in order to conduct a linear regression:
(This section is in progress, check back for updated content)
(In progress)
Multivariate analyses consist of analyzing more than one dependent variable at once. This is in contrast to all of the above Univariate analyses which only include one dependent variable per analysis.
To conduct multivariate analyses in SPSS, you will often need to use the General Linear Model function.
Coming Soon!
Copyright © Baylor® University. All rights reserved.
Report It | Title IX | Mental Health Resources | Anonymous Reporting | Legal Disclosures