Andrew Ba Tran
September 9, 2015
And more importantly, how to tell the difference.
Statistics is the science of making effective use of numerical data.
It deals with all aspects of this, including the
In order to analyze and interpret data, we must first collect it.
The data that is collected is known as a sample.
The sample is collected from a population.
If we were to claim that our results were representative of:
That would be called Biased Sampling and we could use it to lie, cheat, manipulate, or mislead the general public.
There are many different types of sampling bias.
Some examples include:
If we were to claim that our findings were applicable to the entire California coast, or even just the San Diego coast, we would be guilty of perpetrating an area bias.
The area of your sample needs to be representative of the study population.
When reading news stories or scientific articles, make sure you verify that there is no area bias in the study.
The World Wildlife Fund (WWF) has written on the threats posed to polar bears from global warming
However, also according to them, about 20 distinct polar bear populations exist, accounting for approximately 22,000 polar bears worldwide.
You need to look at the whole picture to get the whole story
You need to look at the whole picture to get the whole story
In Self-Selection Bias, a participants' decision to participate may be correlated with traits that affect the study, making the participants a non-representative sample.
For example: If you were to set up a booth to ask people about their grooming habits… The people who respond are more likely to be those who take more time to primp in the morning than those who just throw on something and head out the door.
If you have a survey that asks:
Don’t you think that CCSU part-time professors are paid too little?
A. Yes they should earn more
B. No they should not earn more
C. No opinion
You are suggesting by the tone of the question what you believe the answer should be. That will bias your results (is it always bad?)
If you ask people in a survey about how often they shower, or how often they recycle, your data is going to be biased by the fact that nobody wants to admit to doing something that is considered socially undesirable.
Data analysis is a process of gathering, modeling, and transforming data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making.
What if you were a real-estate agent and you were trying to convince people to move into a particular neighborhood.
You could, with perfect honesty and “truthfullness” tell different people that the average income in the neighborhood is:
This particular neighborhood is lucky enough to be near a cliff… and the ONE home with an ocean view is a giant mansion on 50 acres that is owned by a Hollywood Star. With gates. And spikes. And security to keep out the riff raff of the rest of the neighborhood of poor people and the few middle class that live nearby
If your goal is to lie, cheat, manipulate, or mislead, visualizations are your friend…
The principals of excellent data visualization are:
To lie, cheat, manipulate, or mislead, do NOT follow this advice.
This is real data. The top graph shows the cosmic radiation rate in neutrons per hour. The lower is the temperature change since 1975.
Here, the data is the same but by changing the axis labels, someone was able to really suggest that the difference in population was much greater than it was.
Once again, both of these charts show the same information if you ONLY look at the HEIGHT of the frogs. The volume of an image is a great way to lie, cheat, manipulate, or mislead…