Polling, Randomness, and Biases
Various organizations use polling to predict the results of elections. As I mentioned in my previous entry, if you have a truly random sample, you don't actually need to poll a lot of people.
That brings to mind that it might be useful to understand some definitions. The first think to think of is the population you are interested in - that's everyone in the group you are interested in. In an election, that can actually be a little bit complicated. Do you mean everyone who can vote or everyone who will vote? Some of those in the population of those who can vote will not vote.
When I spoke about Big Data, I indicated that in such cases you'd be getting information on as large a chunk of the population as you could. It's only been the last few years that that's been something that could be considered in elections - and we're not there yet. That's why polling has been used to predict election outcomes.
In polling, you get data from a sample - a subset of the overall population. This subset should be random. If you wanted to predict the results of a vote in Massachusetts, you would get data from a sample of eligible voters in Massachusetts. This sample should be random. The problem is it won't be.
How would you get this random sample? Do you mail it to people at random? If so where do you get the address list from? And what do you do about the people who don't respond at all? Even if you could get a perfect list, a certain number - probably large - will throw away this mailed survey. What if non-respondents wind up being more likely to vote for a certain candidate - something that's happened in many elections. What if people aren't honest in their response? For example, what about cases where a respondent says they will vote but come election day, actually doesn't?
A perfect poll would be sent to a totally random group of people, they'd all respond and respond honestly. If this all happened you could make do with a remarkably small sample size.
These are all called biases. The job of pollsters is to, as much as possible, eliminate these biases. When this isn't possible, they will need to take into account such biases. This isn't an easy thing to do which is why a good pollster is a valuable thing.
I'll talk a bit about biases in upcoming posts and show some elections where polls failed rather fantastically.
The material in this post didn't appear in any research papers I've written - my audiences were generally familiar with such terminology. I did grab a textbook I've used in my coursework to help make certain I used consistent definitions - it's listed below.
De Veaux, R. D., Velleman, P., & Bock, D. E. (2016). Stats: Data and Models, 4th Edition. Boston: Pearson.