In my previous blog post in this series I talked about the impact machine coding has on the quality of your data. Having a human being actually looking at each open-ended response in your survey may be time-consuming, but the results are hard to argue with - a code scheme that accurately represents the intention of the respondents' comments and a true reflection of the study results.
Today I want to focus on a different data quality issue that can be uncovered by physical, human inspection. In my nineteen years of coding open ended responses here at DDG, I've seen a lot of interesting responses to open-ended questions. People tend to get creative when they're given a nice blank box to put their opinions in instead of being forced to select an option from a list or a number on a scale. Once I got enough like responses to a question that asked about life goals that I had to add "a date with Elizabeth Taylor" to my code scheme! That was an interesting conversation to have with the project manager. I love reading all the things people have to say. But sometimes I find something in those open ends that sets alarm bells off in my head.
DDG started conducting online research soon after its general adoption in the industry. By the early 2000's we were conducting surveys online as a matter of course. After years of working with mail and telephone surveys, we began to notice some differences between the kinds of responses we got on the web versus those we got when the respondent was taking the time to fill out a mail survey, or being coached through an open-ended response on the phone. Sometimes the responses were just a little shorter. Sometimes, if the topic was sensitive, they were actually a little longer, as if the respondent felt more comfortable sharing private information when they weren't speaking with a live human. And sometimes we saw what we call "junk data", where the respondent simply keys a random assortment of characters into the open end field and moves on. Frequently this behavior correlates with other quality issues, and it's one of the things we look for while we're coding.
The coding team helps identify potential data issues is by looking for patterns that would be indicative of fraudulent behavior. We have technology in place to validate the identities of respondents and prevent people from sending programs (called 'bots" through the survey. But data issues caused by 'bots were a major problem before the advent of captcha techology. When I review your open-ended data as part of the coding process, I am also looking for junk responses, responses that don't make sense, repeated response patterns - anything that might indicate that the respondent is not giving your survey the attention it deserves. No matter how many traps we lay to preserve the integrity of your data, there's no substitute for a sentinel standing guard. That's me.