In this post, we’ll define the mean, median, and mode. Along the way, we’ll work through an example of how to find each of them. We’ll also talk about why the median is responds less to outliers than the mean does, a fact which can sometimes be crucial in solving a GRE problem.

First, some definitions. The mean (or average) is found by adding up all of the values of some variable and then dividing by the number of observations. So to go back to our lemonade example:

Date Lemonades Sold
7/2/19 1
7/3/19 2
7/4/19 5
7/5/19 2
7/6/19 3
Calculating

Now, finding the median is a little trickier. Imagine lining up all of the values for your variable from least to greatest. Then, we find the one in the middle. So for example, if we have the data:

    \[1, 4, 9, 5, 3\]

Then, we order it from least to greatest to get:

    \[1, 3, 4, 5, 9\]

And then we pick the middle value, namely 4.

Sometimes, if there is a long string of values, it will be hard to see which value is the middle one. So take the data:

    \[1, 9, 2, 3, 3, 3, 4, 6, 7, 8, 1, 2, 3, 1, 1\]

We order it from least to greatest:

    \[1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 6, 7, 8, 9\]

And then we simply cross off the numbers at the end, each pair at a time, to get:

1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 6, 7, 8, 9

1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 6, 7, 8, 9

...

1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 6, 7, 8, 9

And thus the median is 3.

Now, sometimes we will have an even number of observations. So suppose we have the data:

    \[1, 2, 3, 4\]

Crossing through the numbers on the end, we get:

1, 2, 3, 4

But now, it seems, we are stuck. For the median is a single number - not two numbers! And yet if we cross out any more, we will eliminate all of our numbers.

In these cases, we say that the median is the average of the middle two numbers. Thus, the median in the above case is \frac{2 + 3}{2} = \frac{5}{2}.

So in summary, the median is either the middle number or, if you have two middle numbers, the average of those two middle numbers. Now, we try to find the median for our lemonade example:

Date Lemonades Sold
7/2/19 1
7/3/19 2
7/4/19 5
7/5/19 2
7/6/19 3
Calculating the Median

Finally, the mode is the most straightforward of the three: it is the value that occurs most often. So, if you have the data:

    \[1, 2, 2, 3, 4, 5\]

Then the mode is 2.

Now, a set of numbers can have more than one mode, as in this example:

    \[1, 1, 1, 2, 3, 4, 4, 4\]

Here, since both 1 and 4 occur three times, they are both modes.

Often, to find the mode it can help to put the numbers in ascending order (so you can see how often certain values are repeated). To return to our lemonade example:

Date Lemonades Sold
7/2/19 1
7/3/19 2
7/4/19 5
7/5/19 2
7/6/19 3
Finding the Mode

Finally, one property of the median is that it is "more resilient to outliers." What does this mean? First, let's be clear on what an outlier is: an outlier is a data value that is very far from many of the other observations. So suppose you have some data on how many days in a month it rains:

    \[12, 13, 14, 14, 15, 16, 17, 19\]

But one month, you have constant downpours and so you get 31 as a new observation. Your new data is:

    \[12, 13, 14, 14, 15, 16, 17, 19, 31\]

But the 31 is quite far from all of the other observations. If we were to graph this data on a boxplot:

We would see that 31 sticks out like a sore thumb. That's how you know it is an outlier. (There are more precise, formal definitions of an outlier, e.g. more than 1.5 times the interquartile range outside the 1st or 3rd quartiles, but you won't need to know those kinds of definitions for the GRE).

Now, when we say that the median is "more resilient" to outliers, we mean that if we add an outlier to our data, the median is less affected than the mean. So the change to the median will be less than the change to the mean. Let's confirm that this is the case in our above example. To do so, we will compute the old mean/median, the new mean/median and compare them.

The old mean is: \frac{12 + 13 + 14 + 14 + 15 + 16 + 17 + 19}{8} = 15.
The old median is: \frac{14 + 15}{2} = 14.5 since both 14 and 15 are in the middle of 12, 13, 14, 14, 15, 16, 17, 19.

The new mean is: \frac{12 + 13 + 14 + 14 + 15 + 16 + 17 + 19 + 31}{9} = \frac{151}{9} \sim 16.78.
The new median is: 15.

So we can see that the new mean is \sim 1.78 higher than the old mean, whereas the median only increased by .5 and so the median changed less than the mean did, as expected.

Now, you might wonder why the median is more resilient than the mean to outliers. You won't need to know this for the GRE, but it might help in remembering which one is more resilient to outliers. The reason is that when you are adding a new value to the data set, the median doesn't really care how large/small the new number is. All that matters is whether it is larger than the previous median or smaller. If it is larger than the previous median, then the new median moves a number to the right. If it is smaller, the new median moves a number to the left. But the mean cares about how large/small the new number is. So if the new number is an outlier, then it is, by definition, really large or really small compared to the other numbers. So the mean responds a lot to this new number whereas the median just moves a number to the right/left. That's why outliers affect the mean more than they do the median.

Practice Problems


Leave a Reply