NationStates Jolt Archive


Explains Why Some Poll Data Sucks

Deep Kimchi
07-09-2006, 20:22
http://www.breitbart.com/news/2006/09/07/D8K01L503.html

You know, I always wondered how a polling service could, in far less time than electronic voting machines (even though the sample size is smaller), gather any accurate data. Now I think I know how they do it.

Either they talk to cats and dogs, or they pull it out of their ass. There's no real way to verify their data or data collection techniques as sound or honest, or keep them above any manipulation for financial or political gain.

Funny how people will trust polls (especially exit polls) far more than they will the actual results. If there's any discrepancy they don't like, their first assumption is that the votes were tainted somehow - no one seems to assume that pollsters are asswipes in it for the money.

According to a federal indictment, Costin told employees to alter poll data, and managers at the company told employees to "talk to cats and dogs" when instructing them to fabricate the surveys.

FBI Special Agent Jeff Rovelli said 50 percent of information compiled by DataUSA and transmitted to Bush's campaign was falsified, the Connecticut Post reported Thursday.
Andaluciae
07-09-2006, 20:33
Exit polls are notorious subject to unconscious selection biases. For example, in 2004 the exit polls in Ohio showed Kerry ahead of Bush, yet Bush won Ohio. Why was this? Well, it had a lot to do with who the pollsters were. By and large they were young people, typically college age, and, quite unintentionally, they tended to interview more college age people for their exit polling data than there were as a general part of the population. And knowing how college age students tend to vote, it slanted the results of the exit polls.

Or perhaps one might reference the Penn State study that concluded something was fishy regarding the exit polls difference from the actual vote count in Ohio. This charge was debunked by my Philosophy of Statistics Professor George Schumm, who showed that they used a far too tight margin of error in their study.
John Galts Vision
07-09-2006, 21:41
Even aside from statistical issues, there is sampling error. Whenever you have a sample of data for making inferences on the total population, there will be sampling error. Hence, the confidence intervals that are often provided as margins of error. However, this is statistical. Other sampling error can be methodical. Andaluciae's example is a good one. The pollsters failed to get a random sample of voters. To accurately sample, you need to minimize demographic, economic, and geographical factors. This is often done my random sampling, with a large sample size. In practice, these factors tend to even out, and better represent the larger population.

If the sample collected wasn't random enough, but is still large, you can statistically control for certain variables (like age, race, etc.) and make a good attempt at partialling out their vairance from the end result. Multiple regression and hierarchical linear modeling are some tools that can help in this respect. Even analysis of variance (ANOVA) can tell you whether you have significant main effects for certain variables. Sadly, most polling is not sophisticated enough to really be accurate. Whether this is due from willful incompetence with an eye towards deceit or just plain ineptitude is debatable.
John Galts Vision
07-09-2006, 21:54
Or perhaps one might reference the Penn State study that concluded something was fishy regarding the exit polls difference from the actual vote count in Ohio. This charge was debunked by my Philosophy of Statistics Professor George Schumm, who showed that they used a far too tight margin of error in their study.

I don't mean to hijack the thread, but this brings up another thing that really burns me up about some research today - significance levels.

When doing research in the social sciences, differences between groups, correlations, effect sizes, etc. are almost always based on samples - it's just not practical to get data for the entire population. So, to determine if the inferences based on these samples are appropriate, researchers test their hypotheses. If the likelihood that the result being due to random chance (in sampling, for instance) is higher than a certain theshold, then the difference, correlation, or whatever is not considered statistically significant. In other words, you may have found a strong correlation, but you just can put any faith in that generalizing beyond your sample.

Sounds good. Problem is, unscrupulous researchers move the goal posts, by making ther significance requirements less stringent when they just miss it. For most social sciences, a p-value (basically, probability of being wrong when generalizing to the population) of 0.05 (effect due to chance 5% of the time at this level) is the minimum.

Lets take something that everyone has had rammed down their throats. Remember that initial study a decade or two ago that said that second hand smoke is dangerous? Well guess what, their sample consisted largely of non-smokers who lived with smokers (not people who just got exposed to it in a restaurant periodically) and were breathing it in constantly. The didn't try to statistically account for the exposure level in their final publicized results and press releases. Despite that, they didn't acheive statistical significance at first for some key findings, and had to set the bar lower! Really shoddy research methods.

My point here is not to debate the risks or non-risks of second hand smoke. I think we can all agree that it's not likely to be GOOD for you. My point is that you should take every piece of research out there with a grain of salt, no matter where it comes from.

"Figures don't lie, but liars do figure."