How to Lie, Cheat, Manipulate, and Mislead using Statistics and Visualizations

Andrew Ba Tran
September 9, 2015

But also how to do it right

And more importantly, how to tell the difference.

We'd NEVER lie, right?

Statistics

Statistics is the science of making effective use of numerical data.

It deals with all aspects of this, including the

  • collection
  • analysis and
  • interpretation of data

Collection of Data

  • In order to analyze and interpret data, we must first collect it.

  • The data that is collected is known as a sample.

  • The sample is collected from a population.

Ocean tempearture sampling

  • We wanted to analyze San Diego ocean temperatures.
  • Our population was the ocean off the coast of San Diego.
  • Our sample, was the temperatures recorded by Buoy100 over the last 9 years.

How to Lie, Cheat, Manipulate, and Mislead through poor sampling

If we were to claim that our results were representative of:

  • California coastal waters
  • Southern California coastal waters
  • San Diego coastal waters
  • Or even La Jolla coastal waters

That would be called Biased Sampling and we could use it to lie, cheat, manipulate, or mislead the general public.

Or you could just skip the bother of taking the time sample for data at all

Biased Sampling

There are many different types of sampling bias.

Some examples include:

  • Area bias
  • Self-selection bias
  • Leading question bias
  • Social desirability bias

Area bias

  • If we were to claim that our findings were applicable to the entire California coast, or even just the San Diego coast, we would be guilty of perpetrating an area bias.

  • The area of your sample needs to be representative of the study population.

  • When reading news stories or scientific articles, make sure you verify that there is no area bias in the study.

Area bias example

  • The World Wildlife Fund (WWF) has written on the threats posed to polar bears from global warming

  • However, also according to them, about 20 distinct polar bear populations exist, accounting for approximately 22,000 polar bears worldwide.

    • Only 2 of the groups are decreasing.
    • 10 populations are stable.
    • 2 populations are increasing.
    • The status of the remaining 6 populations is unknown.

Polar bears

You need to look at the whole picture to get the whole story

Polar bears

You need to look at the whole picture to get the whole story

In Self-Selection Bias, a participants' decision to participate may be correlated with traits that affect the study, making the participants a non-representative sample.

For example: If you were to set up a booth to ask people about their grooming habits… The people who respond are more likely to be those who take more time to primp in the morning than those who just throw on something and head out the door.

Leading question bias

If you have a survey that asks:

Don’t you think that CCSU part-time professors are paid too little?

A. Yes they should earn more

B. No they should not earn more

C. No opinion

You are suggesting by the tone of the question what you believe the answer should be. That will bias your results (is it always bad?)

Social desirability bias

If you ask people in a survey about how often they shower, or how often they recycle, your data is going to be biased by the fact that nobody wants to admit to doing something that is considered socially undesirable.

Analyzing data

Data analysis is a process of gathering, modeling, and transforming data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making.

How to Lie, Cheat, Manipulate, and Mislead with poor Analysis

How to Lie, Cheat, Manipulate, and Mislead with poor Analysis

  • “Growth in a Time of Debt” released in 2010.
  • Which purported to show that countries with high levels of debt—specifically, those with debt-to-G.D.P. ratios of more than ninety per cent—grow much more slowly.
  • In the United States and Europe, the big debate was about whether these expansionary policies, which involved taking on high levels of borrowing to finance additional government spending and tax cuts, should be continued or wound down in an effort to balance the budget.

How to Lie, Cheat, Manipulate, and Mislead with poor Analysis

  • Some students at University of Massachusetts, Amherst, sought to replicate the study.
  • After taking account of programming errors and data omissions, they came up with a figure of positive 2.2 per cent for average growth in countries with a debt-to-G.D.P. ratio of ninety per cent or more.

How to Lie, Cheat, Manipulate, and Mislead with poor Analysis

Other ways to Lie, Cheat, Manipulate, and Mislead with Averages

What if you were a real-estate agent and you were trying to convince people to move into a particular neighborhood.

You could, with perfect honesty and “truthfullness” tell different people that the average income in the neighborhood is:

  • $150,000
  • $35,000
  • $10,000

How?

  • The $150,000 figure is the arithmetic mean of the incomes of all the families in the neighborhood.
  • The $35,000 figure is the median.
  • The $10,000 figure is the mode.

This particular neighborhood is lucky enough to be near a cliff… and the ONE home with an ocean view is a giant mansion on 50 acres that is owned by a Hollywood Star. With gates. And spikes. And security to keep out the riff raff of the rest of the neighborhood of poor people and the few middle class that live nearby

Wait... what?

Visualizations

If your goal is to lie, cheat, manipulate, or mislead, visualizations are your friend…

How to Lie, Cheat, Manipulate, and Mislead with poor Visualizations

The principals of excellent data visualization are:

  • well-designed presentation of interesting data – a matter of substance, of statistics, and of design.
  • consists of complex ideas communicated with clarity, precision, and efficiency.
  • that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space.

To lie, cheat, manipulate, or mislead, do NOT follow this advice.

How to Lie, Cheat, Manipulate, and Mislead using Chart Adjustments

This is real data. The top graph shows the cosmic radiation rate in neutrons per hour. The lower is the temperature change since 1975.

How to Lie, Cheat, Manipulate, and Mislead using Bar Charts

Here, the data is the same but by changing the axis labels, someone was able to really suggest that the difference in population was much greater than it was.

It's even more dramatic to use volume instead of bars...

Once again, both of these charts show the same information if you ONLY look at the HEIGHT of the frogs. The volume of an image is a great way to lie, cheat, manipulate, or mislead…

Examples of good visualizations

  • Easy to read, Lots of useful information, Well labeled.
  • Correct comparison of percentages rather than raw numbers!

Famous and excellent visualization of information

Famous and excellent visualization of information

Famous and excellent visualization of information

Napoleon's March to Moscow  The War of 1812

Famous and excellent visualization of information

<br><i>Troops & Temperature During Napoleon's 1812 Russian Campaign</i>

So how do you tell the difference between good and bad information?

  • Look at the sources. If none are given, do NOT trust the information.
  • Check to see if there are any obvious sources of bias in the data. Look at how the data was collected and where it was collected from.
  • Look very closely at the data axis and legend.
  • And finally, do NOT believe everything you are shown just because it is “Science” and “Data”.
  • Try to figure out if the source has some ulterior motive to manipulate your opinion.