Panelist X is an African American male in his early 30’s living in Seattle, Washington. He works part time while attending graduate school and is married with a young child. His hobbies include reading, racquetball, and doing work around the house.
Are you sure about that?
Consider this hypothetical situation: Panelist X has been taking surveys as a member of a general consumer panel for approximately 6 months. During that time, he’s never been flagged for speeding, straight-lining, satisficing, or any other typical negative survey behaviors that a good research company checks for. He passed the IP address validation when he joined the panel, and his mailing address is valid.
So why, on a recent study, does the analyst notice that the response Panelist X keyed into an open-ended text field appears to be absolutely identical to the responses from Panelists A, B, Y, and Z?
Back in the days of in-person interviewing, it was easy to verify with whom you were speaking. Similarly, dialing from a call center with your customer list ensured that you reached the right household. Even with random digit dialing where there is theoretical potential for a respondent to lie about their age, marital status, or household income, there is little incentive for them to do so.
But online research? Just as with online dating, in this era of increased technologic sophistication, it is becoming more and more difficult to verify that people are who they say they are.
What does a fraudulent panelist look like?
Today’s fraudulent respondents have an arsenal of skills at their disposal.
- They can easily obtain mailing address directories that allow them to misrepresent themselves as a real person living at a real address.
- Many options are available to obtain free email addresses that require no identity validation.
- IP addresses can be falsified to avoid traps that look for duplication.
- Sophisticated programs called robots or “bots” can be built that run through surveys automatically and punch responses, without a human being hitting a single key.
In the example scenario of Panelist X, one person using a list of valid name and address combinations and 5 different email addresses to sign up for the panel multiple times, and “takes” the survey multiple times (using a ‘bot’) from different email addresses.
Because the program doesn’t straight-line or speed through the survey, the results don’t immediately jump out as unusual. The problem is identified when the analyst notices an odd combination of words repeated in an open-ended data field across multiple records while working with the data to deliver the findings. The analyst and the panel manager at the research company do some further digging into the panel records for the 5 “respondents” with suspicious data.
Initially, it would all appear to be a strange coincidence, as all 5 records come from different geographies, different IP addresses, and different demographic subgroups. A quick check online verifies that people with those names do live at the addresses listed in the original panel records. Then the panel manager notices that all 5 have Hotmail email addresses configured in a very similar way, and that all 5 joined the panel within a few minutes of each other.
Further digging into their panelist data reveals more peculiarities, as the 32-year-old male graduate student provides his name as Lisa, and the 25-year-old unemployed, single mother reports a household income of $750,000 per year. Taken on their own, any of these factors could be unusual, but legitimate. Viewed together, the conclusion is clear—a single individual is responsible for all 5 survey responses.
What’s the motive?
Seems like a lot of effort, doesn’t it? Signing up for a panel multiple times, setting up multiple email accounts, building programs to navigate through surveys. Obviously the intent is to qualify for multiple incentives, which are frequently paid through online claim codes or other online rewards rather than mailing an actual check to a physical address.
But, you say, surely someone couldn’t make enough money on these incentives, which tend to have fairly small monetary values, to justify all this engineering? Imagine that Panelist X actually has 20 to 30 “panelist records” in the panel, and that only 5 were found because those were the only ones selected for this particular survey.
Additionally, Panelist X has 20 to 30 “panelist records” in 50 other online research panels. Now it begins to make more sense.
How do you protect your custom online panel research?
You make key decisions about your business based on the recommendations provided to you by your research suppliers. How can you be sure those suppliers are being diligent about ensuring the quality of the data that drive those recommendations?
- Check to make sure research suppliers are proactively updating security. What’s the latest security protocol they’ve implemented?
- Ask about traditional security protocols. Do they still use techniques such as IP address validation, trap questions, mailing address verification, straight-lining checks, and survey timing?
- Request multiple check points throughout the research project. Conducting check points using several techniques during recruitment of the panel, during project fieldwork, and during analysis of the final data, can prevent fraudulent panelists from making it through the research process
Today it’s no longer possible to rely on IP address check and captchas to identify fraudulent panelists—rigorous security protocols involving new and innovative techniques implemented across the research process are vital for quality research to be produced.