For this part, we will begin using the Example dataset linked in this guide (both the Excel and CSV Example files have the same data, so either is fine to use) to show you how to perform some data cleaning procedures in SPSS. But let's start with a quick intro on what data cleaning is.
Data cleaning is the process of preparing you data for analyses. It can include:
To learn how you can conduct some simple data cleaning in SPSS, look through the examples below!
After you import/open the Example dataset, click on the Variable View tab at the bottom of the dataset window.
Remember, Labels are the more descriptive names we can give to variables, and Labels are what will appear in any analyses, tables, graphs, charts, or other outputs that you create with that variable. (If you do not add a Label, the variable Name is what will appear in your outputs).
Labels are helpful if your variable Names are not very descriptive or distinct (for example if all of your variables are named Q1, Q2, Q3, etc.), or if you want to add more information about what each variable is (for example, if you used a scale that measures happiness, you may have named your variables Hap_1, Hap_2, Hap_3, etc. for each individual question of the scale. You can add Labels to each variable with the specific text for each question of that scale to help you know which question is which).
If your variables are named in a similar manner to Q1, Q2, Q3, etc., before adding Labels, you should first edit the variable Names to be some sort of shorthand name that is more descriptive and unique for each variable so you know which variable is which. For example, if your Q1 variable was a consent form question, change the name from Q1 to consent. If Q2 asked for participants' ages, change Q2 to age.
Our example dataset already has descriptive shorthand names for our variables, so we can skip this step here. Ideally your variables Names should follow a similar format to the Names in the example dataset, wherein you can easily tell what each variable is based on the shorthand Names, and then afterward, you can add Labels to any that require further clarification or details to fully understand what that variable is.
Some of our variables in the example dataset do not actually need Labels; the variables Gender, Age, and Ethnicity (for example) are already very descriptive and clear with just their Names, so we do not need to add Labels to those. The output of any analyses we run with these variables will display these variables' Names since we are not adding Labels, and that is absolutely fine in this case because the Names are very clear about what these variables are.
We’ll start with Years_Employed and add a Label that doesn't include the underscore just to clean up how this variable will display when it appears in our output analyses, tables, graphs, etc. (While this variable Name is technically very clear, it looks nicer to have a Label without the underscore that will then display in our outputs. As a reminder, Names cannot have spaces in them, they can only contain letters, numbers, and underscores. Labels can include any characters, including spaces).
To add a Label, go to Variable View of your dataset.
The table below lists our example variables and the associated Labels we will be adding to them. See the images below the table for visual aids of what adding the Labels will look like.
Name | Label |
---|---|
Years_Employed | Years Employed |
Work_Field | Work Field |
Likert_Num | Likert Numbers |
Time_Task_1 | Time Task - Time 1 |
Time_Task_2 | Time Task - Time 2 |
Hip_Hop | Hip Hop |
Exp_Group | Experimental Group |
Anx_1 | Anxiety - Time 1 |
Depress_1 | Depression - Time 1 |
Confid_1 | Confidence - Time 1 |
Anx_2 | Anxiety - Time 2 |
Depress_2 | Depression - Time 2 |
Confid_2 | Confidence - Time 2 |
Anx_3 | Anxiety - Time 3 |
Depress_3 | Depression -Time 3 |
Confid_3 | Confidence - Time 3 |
After you import/open the Example dataset, click on the Variable View tab at the bottom of the dataset window.
Values are for specifying what each number means for numerically-coded categorical/nominal or ordinal variables. For example, if you collected data on participants' highest level of education, it may be reported in your dataset as 1's, 2's, 3's, 4's, etc. and not the actual words High School Diploma, Associate's Degree, Bachelor's Degree, etc. By adding Values to this education-level variable, you can tell SPSS what each numeric code means, and then for any analyses you run or tables/graphs/charts you make, SPSS will display those Values instead of the numeric codes.
The variables in our example dataset that need Values are: Gender, Ethnicity, and Likert_Num (Likert Numbers).
To add Values, go to Variable View of your dataset.
Ethnicity - Numeric Code |
Values |
---|---|
1 | American Indian or Alaska Native |
2 | Asian |
3 | Black or African American |
4 | Hispanic or Latino |
5 | Native Hawaiian or Pacific Islander |
6 | White |
Likert_Num - Numeric Code |
Values |
---|---|
1 | Strongly Disagree |
2 | Disagree |
3 | Neither Agree nor Disagree |
4 | Agree |
5 | Strongly Agree |
Here are visual aids of what adding the Values to Ethnicity and Likert_Num will look like:
Now your Variable View screen should look like this:
After you import/open the Example dataset, click on the Variable View tab at the bottom of the dataset window.
Measurement Level (referred to as Measure in Variable View) is for specifying the level of measurement (nominal, ordinal, or scale) that each of your variables was collected as.
When you import a dataset, SPSS tries to guess the measurement level for each of your variables. Sometime SPSS incorrectly specifies the measurement level of variables. You can manually change the Measurement Level (Measure) if you notice that it was incorrectly specified for any of your variables.
In the Example dataset, 4 of our variables have incorrectly specified Measurement Levels: Likert_Num, Depress_1, Depress_2, and Confid_3. They were all classified as Nominal when they should all be Scale.
To adjust Measurement Level (Measure), go to Variable View of your dataset.
Once you've finished, your screen should look like this:
Let's say you have a variable that is numeric and the measurement level was incorrectly specified by SPSS as Nominal (or Ordinal). You go to change the Measurement Level but you notice that Scale is not an option - you only see Nominal and Ordinal. What do you do?
After you import/open the Example dataset, stay in Data View.
Transformations are used when you need to re-code any of your variables. Maybe there’s some systematic typos across your data, or you want to change a scale/numeric variable into nominal (text) categories (like changing numeric ages into age categories).
The variables in our example dataset that need to be Transformed are: Work_Field, Time_Task_1, Time_Task_2, and we'll also go over how you could transform Age into age categories.
Let's walk through each of these Transformations - Click through each of the tabs above to learn how to conduct these Transformations!
If you look at the column for the Work_Field variable, you'll notice that some of the responses are lowercase while the rest are uppercase. This can often happen if you have people type in their answers to a question on a survey. The issue is that SPSS is case sensitive, so it will consider the lowercase version of answers to be a completely different response than the uppercase versions (e.g., gov is considered a different answer than GOV even though they are actually meant to be the same answer, and we want them to be considered the same answer). Let’s fix Work_Field so all responses/answers are uppercase.
This is how you transform a string variable into another string variable. Another option is to use the Find and Replace feature. This is similar to a Find and Replace you could do in Microsoft Word or Excel. Here's how you can do a Find and Replace in SPSS.
For this example, we will be transforming the variables Time_Task_1 and Time_Task_2. Both of these variables are coded as string variables because they include text, but we want to change them to just be numbers. You can see that the responses are not even in uniform units; some are in hours, some are in minutes (min). With our transformation, we also want to put these variables in uniform units of just minutes for easier comparison.
Old Value | New Value |
---|---|
3 hours | 180 |
2 hours | 120 |
1.5 hours | 90 |
1 hour | 60 |
45 min | 45 |
30 min | 30 |
15 min | 15 |
Be sure to click Add after each, including after the last one you enter, otherwise SPSS will not include it!
Your screen should look like this:
Changing Type to Numeric and Measure to Scale is very important because if you forget to do these steps, you will not be able to run analyses for numeric variables on Time_Task_1 and Time_Task_2 - you would only be able to run analyses for String, Nominal variables.
What if you have a numeric variable that you want to collapse into nominal categories? You can use the Transform function to do that. We'll practice this by using the Age variable
Note: If you forgot to adjust the Width, you may notice that some of your category names were cut off, like shown below. If this happened, don't worry, just follow the next couple steps to fix this (see steps below).
Copyright © Baylor® University. All rights reserved.
Report It | Title IX | Mental Health Resources | Anonymous Reporting | Legal Disclosures