At Nielsen, I oversaw a product that involved 12,000
surveys a day (4,400,000 surveys a year).
If a panelist cheated on the surveys (copied answers from the web), or
randomly guessed at answers, those behaviors degrade the quality of our
product. So identifying (and so
removing) those guessing and cheating behavior was a very important.
I started by researching what other companies did to
identify random guessing behaviors. I
found lots of suggestions on how to identify these behaviors. And what I learned helped me get
started. But the “general wisdom” on the
subject didn’t go far enough and (as I later learned) was sometimes wrong.
Basically, by using the random guessing “general wisdom”
(labeled in the graph below as “Obvious Markers”, the blue line) I could find that
24% of the panelists/surveys were not guessed (they data suggested they spent
time and effort to give honest answers), that 4% where clearly randomly guessed
(they answered to quickly and/or just typed “A” for every question for example). But that left 72% where I just wasn’t sure if
they guessed or not. So, I analyzed the behaviors
of the obvious guessers and compared them to obvious non-guessers, and used
Bayesian Inference models along with what I learned to create new models that
did a better job of identifying clear guessing behaviors. I did this again and again (about 19 times)
before I was satisfied that I had identified most of the random guessing behaviors.
In the end, I identified 15% of panelists/surveys were
clear guessers (the black line below, to the right near 100%). I also found
that 71% were clearly not guessers (the black line to the left near 0%), and
that left 14% where I still wasn’t sure.
After all that effort, I could use the Bayesian Inference
models to score new surveys that came in (very CPU intensive) or use the
results of the Bayesian Inference models to train more practical modeling
methods that were almost as good.
I also learned some things about guessing from this process
that often surprised me. Two examples are below:
I found in my research that most sources agreed that the
best way to identify guessers is to look at the average time it takes to take a
survey. The belief is that the people
who finish the surveys at record speed are the guessers. And that is true, but there are also lots of
guessers who take a long time to finish the survey.
What I found is that it isn’t the average time the panelist
takes to finish the survey, it is the time they take to answer most of the
questions in the survey. Look at the
graph below, which is the more common pattern of the speed that guessing
panelists take to answer a survey of 20 questions. Most questions are “speed through”, but they
take a long time to answer a few of the questions. Likely speeding through a long survey is
boring, so the guessers get bored and are easily distracted, either by a few
specific questions or more likely they are multitasking and are distracted by
something external to the survey. So,
while most of the questions are answered very quickly (less than 3 seconds
each), the average speed of the survey might not be that slow.
I found that while looking at the average time to finish a
survey catches some guessers, if you look at the Median Time to answer the
questions in the survey you can catch most of them.
Another surprise was what I called “D phobia”. Mostly in multiple-choice questions (4
choices, A, B, C, & D), non-guessers will pick any letter about 25% (more
or less). So clearly, someone who
answers only the letter “C” is likely just guessing. And it is no surprise that people answering
most of the multiple-choice questions with a single letter is randomly
guessing. If you look at panelists who
answer with a single letter 70%+ of the time, you can identify 32% of the
guessers.
But the surprise is that you can identify 50% of the
guessers by just looking for people you never answer with the letter “D” (or
answer the letter “D” less than 5% of the time). For some reason, guessers have “D phobia” and
avoid clicking on the letter “D” answer.
So more than people who answer “C” too often, you can find guessers by
those that answer “D” never or almost never.
Phone: 212-529-5337
Voice Mail: 917-838-7966
Email: Rawley.Cooper@AnalyticForensics.com
Address: 23 East Tenth Street #304
New York City, NY 10003