Skip to Main Content

SPSS Statistical Software

This guide covers the basics of SPSS, including how to conduct simple statistical analyses and data visualization. (The Guide is currently a work in progress, so some content is not in here yet).

Inferential Statistics Overview

While descriptive statistics summarize the characteristics of a dataset, inferential statistics allow you to make conclusions and predictions based on the data. Descriptive statistics allow you to describe a dataset, while inferential statistics allow you to make inferences based on a dataset.

When you have collected data from a sample, you can use inferential statistics to understand the larger population from which the sample is taken. Generally with research, it is not possible for us to collect data from every individual in the world, so we collect data from a sample of individuals and then use our results from our sample to make predictions about what we would expect to see for all individuals based on our results. However, unlike this oversimplified explanation, you likely have specific groups of individuals (i.e., specific populations) you are interested in researching (for example: all women over the age of 35, all employed adults in the US, all Christian families in Texas, all pediatric doctors in Mexico, etc.) so your population would be all individuals who fit your criteria of interest, not just all individuals in general in the entire world. 

Inferential statistics have 2 main uses:

  1. Making estimates about populations of interest (for example, using the average SAT score of a sample of 11th graders to infer what the average SAT score of all 11th graders in the US might be).
  2. Testing hypotheses to draw conclusions about populations (for example, the relationship between SAT scores and family income).

(Information adapted from Bhandari, P. (2023). Inferential Statistics | An Easy Introduction & Examples. Scribbr).

Correlations

Correlations are a simple statistical analysis that are great for helping you predict values of one variable based on another. 

Correlations are a measure of how strongly 2 variables are related to each other. The number you will see in a correlation analysis will represent the strength of the relationship between the 2 variables. Correlations range from -1 to +1, and values closer to either -1 or +1 signify stronger relationships. A correlation of 0 means no relationship.

Positive correlations mean as one of the variables increases, so does the other. Negative correlations means as one variable increases, the other decreases.

Let’s run a correlation!

Background Info on the Pearson Correlation

As a note, the Pearson bivariate correlation (bivariate just means 2 variables) is the most common type of correlation you will come across in research, though you will often just see it simply referred to as a "correlation" or "correlational analysis." There are also other types of correlations, such as the Spearman rank-order correlation, however, for most intents and purposes with quantitative data, the Pearson correlation is the one you will likely use. This is because the Pearson correlational analysis is for Scale data, whereas the Spearman rank order is for Ordinal (rank-ordered) data. It is not advised to run correlations on Nominal data, it will not give meaningful results as the numeric values of Nominal variables just represent categories.

As another note, correlations only measure linear relationships, not parabolic, cubic, or other non-linear relationships. So even though your correlational analysis may not be statistically significant, it is possible that the variables you are looking at relate to each other in some other way. (Two variables can be perfectly related, but if the relationship is not linear, a correlation coefficient is not an appropriate statistic for measuring their association).

Running a Pearson Correlation

  1. Click on Analyze on the menubar, then select Correlate, then select Bivariate.
     
  2. Let's use the the Age and Salary variables for this example. In the popup window, move Age and Salary over to the Variables box. (You can select more than two variables, but we will just use two for now). You'll notice that Pearson is selected by default for the Correlation Coefficient, this is what we want. Leave the Test of Significance as Two-Tailed (I'll explain down below the difference). Optional: you can check the box for Show only the lower triangle if you don't want the mirrored-image upper triangle to display. (I'll show the difference down below). Click OK to run the analysis.
     
  3. Now you will see the Correlational analysis appear in your Output window. You should see the header Correlations and one table below it also labelled Correlations. It should look like this: 
     
  4. How to interpret the Correlation table:
    1. You will see each variable you selected listed on the left side of the table and at the top. This is because we interpret correlations by looking at the intersection of separate variables in the table. 
    2. Look at the Age column and its intersection with the Salary row. We see 3 small rows labelled Pearson Correlation, Sig. (2-tailed), and N. 
      1. The Pearson Correlation row shows us the Pearson correlation coefficient for the linear relationship between Age and Salary. Correlations are reported with the letter r. There is a moderately strong, positive correlation between Age and Salary  (r = .762). (Remember, the closer a correlation coefficient is to -1 or +1, the stronger the relationship between the 2 variables).
      2. The Sig. (2-tailed) row shows us the p-value (or significance value) of this correlation coefficient. For Age and Salary, we have a p-value of less than .001. We report this as: p < .001. Altogether with correlation coefficient, you would report it as: There is a moderately strong, positive correlation between Age and Salary  (r = .762, p < .001). Note: p-values less than .05 are generally considered statistically significant, meaning that we are fairly certain that the 2 variables are actually related and this result wasn't just due to random chance. A p-value of .05 roughly translates to there being a 5% chance that the results are due to random chance; you want lower p-values because lower p-values indicate that there is an even smaller chance that the results are due to random chance. For example a p-value of .01 translates to there being a 1% chance that the results are due to random chance.
      3. The N row shows us the sample size for the two variables in this correlational analysis. In our example, we see that 50 individuals provided data for both Age and Salary

Another Pearson Correlation Example

As another example, let's run a correlation with 4 variables: Age, Salary, Years Employed, and Anxiety 1. Follow the steps above, but this time in Step 2, move Age, Salary, Years Employed, and Anxiety 1 over to the Variables box. Then click OK to run the analysis. See the resulting output table below. (You can add as many variables as you want to a correlational analysis, but keep in mind that the resulting correlation table will get increasingly larger with the more variables you add, and it may consequently become more difficult for you to read the table accurately).

Additional Options: Show Only the Lower Triangle

When you check the box for Show only the lower triangle, the resulting correlation table will not display the mirrored-image, identical values in the upper portion of the table. You can see what this looks like for our Age and Salary correlation table below. Now we see blank cells directly underneath the Salary column where it intersects with the Age row because these cells would contain the exact same values as the cells in the Salary row where it intersects with the Age column. These repeated/mirrored values in the upper "triangle" of the table are not shown. 

The effect of showing only the lower triangle becomes even more apparent in larger correlation tables with more variables; take a look at the correlation table below with Age, Salary, Years Employed, and Anxiety 1. (The top table does not have the box checked for showing only the lower triangle. The bottom table does have the box checked).
 

Notice how much easier it is to read the table when we check the box for Show only the lower triangle. As the upper triangle just repeats/mirrors the values in the lower triangle, you do not need to have the upper triangle visible to be able to interpret your correlation results.

Additional Options: One-Tailed vs Two-Tailed

Generally, you’ll want to use the Two-Tailed test of significance (also called a two-tailed p-value) because a two-tailed test will test for any relationship between the 2 variables. A One-Tailed test only tests for one specific direction (either positive or negative, but not both), and you would have had to make a hypothesis about the specific direction you expected to see prior to running the analysis in order to use a one-tailed test of significance (i.e., a one-tailed p-value). A two-tailed test tests for both positive or negative relationships, so if you don’t know how the variables may relate and just want to know if they relate, use the two-tailed test.

When in doubt, it is almost always more appropriate to use a two-tailed test. A one-tailed test is only justified if you have a specific prediction (hypothesis) about the direction of the difference (e.g., Age being positively correlated with Salary), and you are completely uninterested in the possibility that the opposite outcome could be true (e.g., Age being negatively correlated with Salary).

Additional Options: Style - Highlighting Significant Correlations

Another useful option/setting that you can play around with is the Style settings of the correlation table.

  1. Click on the Style button within the Correlations popup window to open the Style settings window. 
  2. This brings up the Table Style editor window. Click in the cell under Value, then click the dropdown arrow that appears. Select Significance.
     
  3. Once you select Significance, information will populate in the Dimension, Condition, and Format columns. The information that populates by default specifies that for significance values (p-values) less than or equal to .05, SPSS will format those cells to be bright yellow. That is, all significant correlations will be highlighted in yellow. (You can adjust the Condition to other p-values if you would like; for example you could change it to be .01 so only correlations with significance values less than or equal to .01 are highlighted. You can also adjust the Format to other colors instead of yellow). For our purposes, the default conditions of .05 and yellow are fine, so let's just click Continue to save these Style settings. 
     
  4. Then click OK on the initial Correlation popup window to run the analysis. Now you'll see the following output with our significant correlations highlighted. (Note: I checked the box for Show only the lower triangle).
     

Note: We selected Significance when we were adjusting the Style settings, but you could instead select Correlation and set a specific value of correlation coefficient for SPSS to then highlight in your table. You can set the condition to specify highlighting the cells that have a correlation coefficient equal to or higher than your specified value. Or you could have both a Significance condition and a Correlation condition - if you click Add in the Style settings window, you can add multiple conditions. 

Chi-Square Test of Independence/Association

The Chi-Square Test of Independence (a.k.a., Chi-Square Test of Association) determines whether there is an association/relationship between nominal or ordinal variables. (This is different from a Pearson correlation because a Pearson correlation is meant for testing associations/relationships between scale variables). The Chi-Square Test is an extension of the Crosstabs analysis in SPSS. While Crosstabs can show you how two nominal (or ordinal) variables compare, the Chi-Square Test can assess if these variables are significantly related or not. 

Examples of when to use a Chi-Square Test of Independence:

  • You want to see if Level of Education (e.g., High School Diploma, Associate's Degree, Bachelor's Degree, Master's Degree) is related to Marital Status.
  • You want to see if Gender is related to Smoking Status (i.e., whether someone smokes cigarettes or not).
  • You want to see if Ethnicity is related to Political Party Affiliation.
  • You want to see if the effectiveness of a public health intervention, like a flyer vs phone call, is related to an outcome, such as the rate of recycling.
  • You want to see if if someone's age group is associated with their preference for social media platform.

(Not to be confused with the Chi-Square Goodness-of-Fit test, which is an entirely different test. The Chi-Square Goodness-of-Fit test is for when you want to determine if a the distribution of a single categorical variable matches a specific, known distribution, like the normal distribution. The Chi-Square Test of Independence is for testing whether two categorical variables are statistically associated, and it uses a crosstabs/contingency table to compare the two variables).

Extra note: The chi-square test of independence is a nonparametric test. A non-parametric test is a statistical test that does not assume that the data comes from a population with a specific distribution, such as the normal distribution. These tests, also known as "distribution-free" tests, are useful for data that is non-normally distributed, has small sample sizes, contains outliers, or is measured at the ordinal or nominal level. Because the chi-square test of independence is for assessing nominal and/or ordinal data, it is considered a non-parametric test.

Statistical Assumptions of the Chi-Square Test of Independence

  1. Two categorical variables (can be nominal and/or ordinal)
  2. Each variable has 2 or more categories/groups
  3. Independence of observations
    1. All of your participants/data-points should be independent of each other. This means each participant is unique, in other words, nobody took your survey or submitted data more than once. Each data point is unique.
    2. The categorical variables are not "paired" in any way (e.g., the variables you are using in this analysis are not paired pre-test and post-test observations).
  4. Relatively large sample size.
    1. Expected frequencies for each cell are at least 1.
    2. Expected frequencies should be at least 5 for the majority (80%) of the cells. (In other words, no more than 20% of your cells should have expected counts less than 5. SPSS notes this percentage at the bottom of the relevant tables when you run the analysis).

For further info on these assumptions, check out these resources: Laerd Statistics Chi-Square Test for Association GuideScribbr Chi-Square Test of Independence GuideKent State Chi-Square Test of Independence Guide

Running a Chi-Square Test of Independence

  1. Click on Analyze on the menubar, then select Descriptives, then select Crosstabs. (Note: You'll notice there is an option at the bottom of the list called ChiSquare, which also allows you to run a Chi-Square Test of Independence. However, I recommend using the Crosstabs menu to run the Chi-Square Test of Independence because it allows you to specify/adjust useful parameters and options for the analysis that are not available in the stand-alone ChiSquare menu).
     
  2. In the popup window, select one nominal/ordinal variable and move it to the Row(s) box and then select another nominal/ordinal variable and move it to the Column(s) box. For example purposes (using the example dataset), let's select Gender and move it over to the Row(s) box, and then select Work Field and move it over to the Column(s) box. 
    (Note: it doesn't matter which one you put in the Row(s) box and which in the Column(s) box, it just changes the orientation of the table. I suggest putting the variable with more categories in the Row(s) box and the variable with less categories in the Column(s) box so you get a tall, narrow table instead of a short, wide table that you may need to scroll left/right to fully see). 
     
  3. Click on the Statistics button on the right. In the little window that appears, check the box for Chi-Square. Also check the box for Phi and Cramer's V. Click Continue. (Note: if you forget to check the box for Chi-Square, then SPSS will just run a regular crosstabs analysis).
    1. Phi and Cramer's V are effect sizes for the Chi-Square analysis. A chi-square analysis just tells you if the 2 variables are significantly related, not the strength of that relationship. The effect size (Phi or Cramer's V) can tell you the strength of the relationship. The analysis outputs both effect size values, but which one you need to use depends on how many categories are in each of the two variables you are analyzing. If each variable only has 2 categories (e.g., Gender: Male or Female, and Smoking Status: Yes smoker or Not smoker), you will use Phi. If one (or both) of the variables has more than 2 categories (e.g., Work Field has 5 categories), then you will use Cramer's V.
       
  4. Click on the Cells button on the right. In the little window that appears, check the box under Counts for Expected (also make sure Observed is checked too), check the box under Percentages for Row, check the box under Residuals for Adjusted Standardized. Click Continue.
     
  5. Optional: After clicking Continue and returning to the main Crosstabs window, you can check box for Display Clustered Bar Charts if you would like SPSS to include in the output a bar chart with the frequency counts of how many (in our example) men and women work in each Work Field category.
  6. Click OK.
  7. Now you should see the Crosstabs with Chi-Square analysis in your output window. 
     
     
     
  8. How to interpret the output:
    1. Case Processing Summary table: This table just shows you the number of individuals in your sample that provided data for both of your specified variables and if there is any missing data.
    2. Crosstabulation table: The Crosstabulation table (labelled with the names of the 2 variables you selected - in our example, Gender * Work Field Crosstabulation) is the crosstabs results table. Each row and column will correspond to the variables you selected for the analysis.
      1. We see Work Field on the top of the table with each specific Work Field category listed in the columns. Gender is listed at the left side of the table with the rows representing each category of the Gender variable. 
      2. The cells in the intersections of each column and row represent the number of individuals in our sample who fell under both of the corresponding categories. Because we checked the boxes for Count, Expected Count, Row Percentages, and Adjusted Standardized Residuals, we see cells for each of those specifications. For example, the cell that intersects BUS and Male has a value of 6 in it, indicating that 6 individuals in our sample responded that they are Male and that they work in the Business field. If we look at Female and HEALTH, we see that 8 women in our sample work in the Health field.
      3.  
      4. The Total column lists the total number of individuals in our sample who work in each Work Field category, regardless of their Gender. Each total value is the sum of the cells to the left of it. We see we have 10 individuals who work in the Business field, 9 who work in the Education field, 9 who work in the Government field, and so on. (This column should show us the same numbers as if we ran Frequencies on just the Work Field variable)
        1. The Total row lists the total number of individuals in our sample who fall under each Gender category, regardless of their Work Field. Each total value is the sum of the cells above it. We see we have 24 Males in our sample and 26 Females. (This row should show us the same numbers as if we ran Frequencies on just the Gender variable).

t-Tests

What is a t-Test?

A t-test is a statistical test that is used to compare the means of two groups. It is often used in hypothesis testing to determine whether an intervention or treatment actually has an effect on the people within the study, or whether two groups are different from one another.(https://www.scribbr.com/statistics/t-test/)

If the groups come from a single population (e.g., pre-test and post-test data on the same individuals, or measuring the same group of subjects before and after an experimental treatment), conduct a paired-samples t-test. This is a within-subjects design.

If the groups come from a 2 different populations (for example, men and women, or people from two separate cities, or students from two different schools), conduct an independent-samples t-test. This is a between-subjects design.

One-sample t-test is for comparing one group to a standard value or norm (like comparing acidity of a liquid to a neutral pH of 7).

Paired Samples t-Test

Let's go over how to conduct a paired samples t-test. We'll use the variables Time Task 1 and Time Task 2 in this analysis to see if our sample's times for finishing the running race improved from the first time they ran the race (Time 1) to the second time (Time 2). We are using a paired samples t-test for this analysis because we are examining the same individuals across the two time points.

  1. Click on Analyze on the menubar, then select Compare Means and Proportions, then select Paired Samples T Test.
     
  2. In the popup window, move over Time Task 1 and Time Task 2 (specifically in that order) to the Paired Variables box. 
     
  3. Click OK to run the analysis. You should see the following output:

     
  4. Let's go over how to interpret the output tables:
    1. Paired Samples Statistics Table: This table shows you descriptive statistics for the variables you selected. We can see the Mean times (in minutes) for completion of Time Task 1 (88.20 minutes) and Time Task 2 (54.60 minutes). We can also see the sample size N for each variable (50 individuals for both variables). We can also see each variable's standard deviation and standard error mean. (The standard error mean is a measure of how different the population mean would likely be from the sample mean. It tells you how much the sample mean would vary if you were to repeat the study using new samples from within the same population).
    2. Paired Samples Correlations Table: This table shows you the Pearson correlation coefficient for the pair of variables you selected, so we can see how strongly the two variables relate. In our example, we see that Time Task 1 and Time Task 2 have a moderately strong positive relationship with a correlation coefficient of .788, and this correlation is statistically significant (p < .001). 
    3. Paired Samples Test Table: This is the actual t-test analysis of your variables. 
      1. The Mean here is the average difference between the 2 variables (if you take the mean values from the top table and calculate the difference (88.2 minus 54.6), it should equal the value in this table 33.6).
      2. The Std. Deviation is the standard deviation of the differences between the paired observations from each of the 2 variables.
      3. The Std. Error Mean here is the standard error of the mean difference, which essentially represents the estimated variability of the average difference between paired observations in your sample. It's calculated by dividing the standard deviation of the differences by the square root of the sample size (in our example, it would be 34.759 divided by the square root of 50). A smaller standard error of the mean indicates greater confidence in the observed mean difference.
      4. The 95% Confidence Interval of the Difference, Lower and Upper values are showing you a range of values within which we are confident (in this case, 95% confident) that the true population mean difference between paired observations lies, based on our sample data. Essentially, it indicates the likely range for the average difference between the two paired measurements with a certain level of confidence (like 95%) based on the study results.
      5. The t value is represents the t-statistic, which is a calculated value used to determine if there is a statistically significant difference between the means of two paired groups, essentially measuring how many standard errors the observed mean difference is away from the null hypothesis mean (usually zero). A larger absolute-value of t indicates a greater difference between the paired groups, making it more likely to reject the null hypothesis.
      6. The df value represents the degrees of freedom of this analysis. It is calculated by subtracting 1 from the sample size (in our example, 50 - 1 = 49). (Degrees of freedom refers to the number of independent pieces of information used to calculate a statistic, essentially representing how many values in a data set are free to vary when estimating a population parameter).
      7. Significance. There are two significance values listed: the One-Sided p and the Two-Sided p. Which one should you use? Generally, you’ll want to use the Two-Sided p-value, as a two-sided test will test for any difference between the 2 variables. A One-Sided test only tests for one specific direction (either positive or negative, but not both), and you would have had to make a hypothesis about the specific direction you expected in order to use this p-value. A two-sided test tests for both positive or negative differences, so if you don’t know how the variables may differ and just want to know if they differ, use the two-sided p-value. When in doubt, it is almost always more appropriate to use a two-tailed test. A one-tailed test is only justified if you have a specific prediction (hypothesis) about the direction of the difference (e.g., Group A scoring higher than Group B), and you are completely uninterested in the possibility that the opposite outcome could be true (e.g., Group A scoring lower than Group B). A one-tailed test looks for an “increase” or “decrease” in the parameter whereas a two-tailed test looks for a “change” (could be either an increase or decrease) in the parameter.
  5. Paired Samples Effect Sizes Table: 
    • This table helps you understand the magnitude of the difference between the two groups (in this example, the difference between the Time 1 results and Time 2 results).
    1. The Standardizer value indicates what SPSS used to standardize the mean difference when calculating effect sizes (generally it's the standard deviation of differences from the Paired Samples Test Table. In this example, we can see that this is the case, the Std. Deviation value from the Paired Samples Test Table (34.759) is also the Standardizer value in the Paired Samples Effect Sizes Table).
    2. The Point Estimate is the value of the effect size measure, for example Cohen's d or Hedge's g (also called Hedge's bias correction factor for Cohen's d). An effect size is a standardized measure of the size of the mean difference. The point estimate represents the effect size, which is the standardized difference between the paired means (i.e., the difference between 2 group means expressed in terms of standard deviations).
      1. Hedge’s g is a correction factor for Cohen’s d that should be used when the groups you are comparing have small sample sizes (n < 20). Hedges' g corrects for bias in Cohen's d when sample sizes are small. The value of Hedge's g (Hedge's correction) is usually similar to d, but slightly smaller when n-size is small.
      2. To interpret, identify the Cohen's d value in the table and then compare this d value against established benchmarks (see table below) to understand the practical significance of the findings beyond statistical significance.
    3. The 95% Confidence Interval, Lower and Upper values are the lower and upper bounds of the confidence interval for the effect size estimates (e.g., Cohen’s d and Hedge's correction). The confidence interval helps us assess the precision of the effect size estimate. A smaller range indicates higher precision in the estimate of the effect size. 

Interpretation of Cohen’s d effect size (if d is negative, use its absolute value to interpret)

Cohen's d Interpretation
0.2 to 0.49 Small effect
0.5 to 0.79 Medium/moderate effect
0.8 or higher  Large effect

Note: if your Cohen's d value is near the threshold between two interpretations, for example 0.49, you could say it’s a “small-to-medium effect.” 0.79 would be considered a “medium-to-large effect.”
 

What is a t-Test?

A t-test is a statistical test that is used to compare the means of two groups. It is often used in hypothesis testing to determine whether an intervention or treatment actually has an effect on the people within the study, or whether two groups are different from one another.(https://www.scribbr.com/statistics/t-test/)

If the groups come from a single population (e.g., pre-test and post-test data on the same individuals, or measuring the same group of subjects before and after an experimental treatment), conduct a paired-samples t-test. This is a within-subjects design.

If the groups come from a 2 different populations (for example, men and women, or people from two separate cities, or students from two different schools), conduct an independent-samples t-test (also known as a Two-Samples t-test). This is a between-subjects design.

One-sample t-test is for comparing one group to a standard value or norm (like comparing acidity of a liquid to a neutral pH of 7).

Independent Samples t-Test

Let's go over how to conduct an independent samples t-test. We'll use the variables Gender and Anxiety Time 1 in this analysis to see if men and women differ on their anxiety levels. We are using an independent samples t-test for this analysis because we are examining unrelated groups.

  1. Click on Analyze on the menubar, then select Compare Means and Proportions, then select Independent-Samples T Test. 
  2. In the popup window, move over Anxiety Time 1 to the Test Variable(s) box, and move over Gender to the Grouping Variable box. You'll notice there are 2 questions marks by Gender now. This is because we need to define how the groups were coded in the data. Click the Define Groups button.
     
  3. In the Define Groups popup window, input 1 for Group 1 and input 2 for Group 2. We do this to tell SPSS that subjects in Group 1 (Males) were coded with the numeric value 1 and subjects in Group 2 (Females) were coded with the numeric value 2. (Make sure the values you input match how you coded your data. If, for example, we had coded Males as 0 and Females as 1, then we would input 0 into the Group 1 box and 1 into the Group 2 box. Or if we wanted to use Ethnicity as the Grouping Variable, we'd have to select two specific ethnic groups to use, so, for example, we could input 3 for Group 1 to indicate Black or African American and 5 for Group 2 to indicate Native Hawaiian or Pacific Islander. T-tests can only compare 2 groups, so you can only select 2 to compare. ANOVAs can compare more than 2 groups). Click Continue.
     
  4. Back in the main independent samples t-test window, check the box for Homogeneity of variance test. Leave the Estimate Effect Sizes boxed checked too. (Note: SPSS 30 has a slightly different independent samples t-test interface than prior versions of SPSS. If you have a prior version, you can skip this step as the Homogeneity of variance test is automatically calculated when you run this t-test).
     
  5. Click OK to run the analysis. You should see the following output (Note: if you have a version of SPSS prior to version 30, Levene's test (Homogeneity of Variance Test) will appear in the Independent Samples Test table):

     
  6. Let's go over how to interpret the output tables:
    1. Group Statistics Table: This table shows you descriptive statistics for the variables you selected. We can see the sample size N for each group (24 Males and 26 Females). We can also see the Mean Anxiety Time 1 scores for Males (21.79 points) and Females (22.00 points). We can also see each group's standard deviation and standard error mean. (The standard error mean is a measure of how different the population mean would likely be from the sample mean. It tells you how much the sample mean would vary if you were to repeat the study using new samples from within the same population).
    2. Homogeneity of Variance Test Table: (while this table appears as the 3rd table, it should be interpreted before the 2nd table (Independent Samples Test table) because the results of the Homogeneity of Variance test determine which row of the Independent Samples Test table you need to look at).
      1. Levene’s Test (Homogeneity of Variance test) is used to assess whether the variances of our two groups are approximately equal or not. Many statistical tests have specific statistical assumptions that need to be met in order to properly run the test, and equality of variances is one of the assumptions for an independent samples t-test. (Variance is a measurement of the spread between data points in a dataset. It measures how far each data point is from the mean value. When running an independent samples t-test, we want the variances of both groups to be approximately equal). When the variances of both groups are approximately equal, we meet the homogeneity of variances assumption. In turn, when we meet this assumption, Levene's test will be non-significant, meaning p > .05).
      2. We want Levene's test to be non-significant as this indicates we meet the homogeneity of variances assumption. If this test is non-significant, we can use the Equal Variances Assumed row when interpreting the Independent Samples Test table above. If this test is significant, then this means our data does NOT meet the homogeneity of variances assumption and we must use the Equal Variances Not Assumed row for interpreting the Independent Samples Test table.
    3. Independent Samples Test Table: This table shows you ... (coming soon)
      1. The
      2. Significance. There are two significance values listed: the One-Sided p and the Two-Sided p. Which one should you use? Generally, you’ll want to use the Two-Sided p-value, as a two-sided test will test for any difference between the 2 variables. A One-Sided test only tests for one specific direction (either positive or negative, but not both), and you would have had to make a hypothesis about the specific direction you expected in order to use this p-value. A two-sided test tests for both positive or negative differences, so if you don’t know how the variables may differ and just want to know if they differ, use the two-sided p-value. When in doubt, it is almost always more appropriate to use a two-tailed test. A one-tailed test is only justified if you have a specific prediction (hypothesis) about the direction of the difference (e.g., Group A scoring higher than Group B), and you are completely uninterested in the possibility that the opposite outcome could be true (e.g., Group A scoring lower than Group B). A one-tailed test looks for an “increase” or “decrease” in the parameter whereas a two-tailed test looks for a “change” (could be either an increase or decrease) in the parameter.
  7. Independent Samples Effect Sizes Table: 
    1. (Coming Soon!)

Coming soon

ANOVA

What is an ANOVA?

ANOVA stands for Analysis of Variance. Recall that a t-test can only be used when comparing the means of 2 groups (a.k.a. pairwise comparison). If you want to compare the means of more than 2 groups, you conduct an ANOVA. ANOVAs are used to analyze the difference in means among 3 or more groups. There are different types of ANOVAs, scroll down to learn about the One-Way ANOVA, or click through the tabs to learn about other types of ANOVAs.

One-Way ANOVA

A one-way ANOVA is used to determine whether there are any statistically significant differences between the means of 3 or more independent groups. For example, you could test whether freshman (1st-year undergrad students), sophomores (2nd-year undergrads), juniors (3rd-year undergrads), and seniors (4th-year undergrads) differ in their stress levels. (For more information, see: https://www.scribbr.com/statistics/one-way-anova/

In order to properly use a one-way ANOVA, there are some statistical assumptions that must be met. Statistical assumptions are underlying conditions that must be met for a statistical test to provide valid results. These assumptions are like rules that need to be followed to ensure the conclusions drawn from the analysis are reliable. Violating these assumptions can lead to inaccurate interpretations and flawed conclusions.

Statistical Assumptions of the One-Way ANOVA:

  1. Independence of Observations
    1. All of your participants/data-points should be independent of each other. This means each participant is unique, in other words, nobody took your survey or submitted data more than once. Each data point is unique.
  2. Normality of Data
    1. The values of the dependent variable follow a normal distribution.
  3. Homogeneity of Variance (Homoscedasticity)
    1. The population variances in each group are equal.
  4. No Extreme Outliers
    1. There should be no extreme outliers in the data. (An outlier is an observed data point that has a value that is very different from all other values).

For further info on these assumptions, check out these resources: Laerd Statistics One-Way ANOVA Guide, Scribbr One-Way ANOVA Guide, Kent State One-Way ANOVA Guide

In addition to those assumptions, there are other requirements of your data in order to conduct a one-way ANOVA:

  • Your dependent variable is continuous (i.e., interval or ratio level)
  • Your independent variable that is categorical (i.e., two or more groups)
  • Cases that have values on both the dependent and independent variables
  •  

(This section is in progress, check back for updated content)

How to Run a One-Way ANOVA

  1. Click on Analyze on the menubar, then select Compare Means and Proportions, then select One-Way ANOVA.
  2. In the window that pops up, bring over your dependent variable into the Dependent List box.(Your dependent variable needs to be at the Scale measurement level).
  3. Bring over your independent variable into the Factor box. (The independent variable needs to be at the Nominal or Ordinal measurement level, as the ANOVA will be assessing whether or not the different groups/categories have significantly different values on the dependent variable). 
  4. Click OK. (This will run the omnibus ANOVA. The omnibus ANOVA test tells you if there is a significant difference between the groups, but it doesn't tell you which specific groups are significantly different).
  5. Look at the output and if the F-statistic of your omnibus ANOVA is significant (p < .05), then you will re-run the ANOVA with a post-hoc test.
  6. To re-run the ANOVA, click on Analyze on the menubar, then select Compare Means and Proportions, then select One-Way ANOVA.
  7. All of your prior information you inputted should still be in the window that pops up. Now click the Post Hoc button on the right.
  8. In the Post-Hoc window, select one (or more) post-hoc tests to run. The most frequently used post-hoc test is the Bonferroni, so let's select Bonferroni and then click Continue. (Another common post-hoc test is Tukey. You can select more than one post-hoc test to see how the tests differ in their calculation of the p-values; some are more stringent than others).
  9. Click Ok.
  10. Now your output will contain the Post-Hoc pairwise comparisons table which allows you to see which specific groups (pairwise comparisons) were significantly different from each other.

Coming soon!

Coming Soon!

Regression

Regression is a method used to analyze the relationship between a dependent variable and one or more independent variables. It helps predict or understand how changes in the independent variable(s) affect the dependent variable. The dependent variable must be 

More specifically, regression analysis seeks to find a mathematical equation (a "regression model") that describes the relationship between the inputted variables, often by finding the line (or curve) that best fits the data points. Essentially, a regression analysis aims to find a model that best fits the data, allowing for predictions and insights into the relationships between variables. 

There are different types of regression analyses, scroll down to learn about linear regression, or click through the tabs to learn about other types of regression analyses.

Linear Regression

Regression is used to estimate the relationship between one continuous/scale dependent variable and one or more independent variables, which can be continuous/scale or nominal/categorical. Simple linear regression consists of one independent variable and one dependent variable. Multiple linear regression consists of two or more independent variables and one dependent variable. Linear regression specifically finds the "line of best-fit" for the variables, a linear equation that explains how the variables relate to each other. The line and equation can be used to predict the value of the dependent variable based on differing values of the independent variable(s). For example, examining how stress levels, hours of sleep, and gender relate to test scores (For more information, see: https://www.scribbr.com/statistics/simple-linear-regression/

In order to properly run a linear regression, there are some statistical assumptions that must be met. Statistical assumptions are underlying conditions that must be met for a statistical test to provide valid results. These assumptions are like rules that need to be followed to ensure the conclusions drawn from the analysis are reliable. Violating these assumptions can lead to inaccurate interpretations and flawed conclusions.

Statistical Assumptions of Simple Linear Regression:

  1. Independence of Observations
    1. All of your participants/data-points should be independent of each other. This means each participant is unique, in other words, nobody took your survey or submitted data more than once. Each data point is unique.
  2. Normality of Data
    1. The values of the dependent variable follow a normal distribution.
  3. Homogeneity of Variance (Homoscedasticity)
    1. The variances of the errors is constant across all levels of the independent variables. In other words, the spread of the data points around the regression line should be roughly the same for all values of the predictor variables. (The variances along the line of best fit remain similar as you move along the line).
  4. Linearity
    1. The relationship between the independent and dependent variables is linear. The line of best fit through the data points is a straight line, and not a curve or any other shape.
  5. No Extreme Outliers
    1. There should be no extreme outliers in the data. (An outlier is an observed data point that has a dependent variable value that is very different to the value predicted by the regression equation. As such, an outlier will be a point on a scatterplot that is (vertically) far away from the regression line indicating that it has a large error value (residual)).

 

Statistical Assumptions of Multiple Linear Regression:

  1. Independence of Observations
    1. All of your participants/data-points should be independent of each other. This means each participant is unique, in other words, nobody took your survey or submitted data more than once. Each data point is unique.
  2. Normality of Data
    1. The values of the dependent variable follow a normal distribution.
  3. Homogeneity of Variance (Homoscedasticity)
    1. The variances of the errors is constant across all levels of the independent variables. In other words, the spread of the data points around the regression line should be roughly the same for all values of the predictor variables. (The variances along the line of best fit remain similar as you move along the line).
  4. Linearity
    1. The relationship between the independent and dependent variables is linear. The line of best fit through the data points is a straight line, and not a curve or any other shape.
  5. No Multicollinearity
    1. The independent variables are not highly correlated with each other. (High correlation between predictors can make it difficult to isolate the individual effect of each predictor on the dependent variable). 
  6. No Extreme Outliers
    1. There should be no extreme outliers in the data. (An outlier is an observed data point that has a dependent variable value that is very different to the value predicted by the regression equation. As such, an outlier will be a point on a scatterplot that is (vertically) far away from the regression line indicating that it has a large error value (residual)).

For further info on these assumptions, check out these resources: Laerd Statistics Linear Regression GuideLaerd Statistics Multiple Regression Guide, Scribbr Multiple Linear Regression Guide, Scribbr Simple Linear Regression GuideStatisticsSolutions' Multiple Linear Regression Assumptions Guide

In addition to those assumptions, there are other requirements of your data in order to conduct a linear regression:

  • Your dependent variable must be continuous (i.e., interval or ratio level)
  • For Simple Linear Regression, your independent variable must be continuous (i.e., interval or ratio level)
  • For Multiple Linear Regression, your independent variables can be either categorical (i.e., two or more groups) or continuous (i.e., interval or ratio level)

 

(This section is in progress, check back for updated content)

(In progress)

Multivariate Analyses

Multivariate analyses consist of analyzing more than one dependent variable at once. This is in contrast to all of the above Univariate analyses which only include one dependent variable per analysis. 

To conduct multivariate analyses in SPSS, you will often need to use the General Linear Model function.

Coming Soon!

University Libraries

One Bear Place #97148
Waco, TX 76798-7148

(254) 710-6702