The BriefA Blog about the LSAT, Law School and Beyond
A box plot is so named for its iconic shape:
(They can also be laid out horizontally). The parts of the graph correspond to:
Looking at a box plot can give you a quick sense of what the distribution of the data looks like. For example:
Example 1
Find the 25th percentile, 50th percentile, 75th percentile, range, and median for the below boxplot:
Example 2
The following chart represents the books read by the 500 fifthgraders in a school over the summer. Suppose no student read exactly 4 books. How many students read more than 4 books?
Example 3
The following chart represents the books read by the 500 fifthgraders in a school over the summer. What is the approximate number of students that have read between 3 and 4 books?
Practice Problems

The following chart represents the books read by the 100 fifthgraders in a school over the summer. What is the approximate number of students that have read more than 5 books? (We assume that people can only read positive integers of books).

The following chart represents the books read by the 500 fifthgraders in a school over the summer. Suppose 7 is the 80th percentile of this data. Approximately how many people read between 6 and 7 books? (We assume that people can only read positive integers of books).
 Researchers measured, over 36 months, how often it rained in a month. About how many months had rain from between 13 and 18 days?
A histogram looks a lot like a bar graph and, indeed, you can think of it as a special kind of bar graph. But instead of having just any kind of category (as a bar graph does), histograms have, as their categories, certain ranges of values. So, for example, suppose the students in your class score the following scores on their test:
Now, those numbers are kind of unwieldy. So to get a sense of how many students are acing your class (getting an A) or failing your class (getting an F), you might categorize their test scores according to certain ranges. So put all the 90100 scores together in one category, all the 80  89 scores in the same category, and so on. Graphing this, we would get the following:
This graph tells us that 4 students scored between 90 and 100; 3 students between 80 and 89; and so on.
Example 1
In the following histogram, approximately how many students scored a 170 or higher?
Example 2
In the below histogram, what if anything can we conclude about the median of the data?
Example 3
In the following chart, what can we conclude about the range of the data?
Practice Problems
 In the following histogram, approximately how many students scored a 170 or lower?

In the below histogram, what if anything can we conclude about the range of the data?
 In the below histogram, what is the minimum possible number of students who scored higher than 75? What about the maximum?
A bar graph tells you how many objects in a given category you have. For example, in the following bar graph:
Each bar tells you how many children fall into that height range. So, for example, there are 2 children between 0 and 4 feet; there are 7 between 4 feet 1 inch and 4 feet 6 inches, and so on.
Example 1
How many children are between 4 feet 7 inches and 6 feet? How many children are taller than 6 feet?
We can also have segmented bar graphs like the following:
These bar graphs allow us to compare different groups, in this case the group of 1 year olds against the group of 2 year olds, 3 year olds, and so on. We can thus see how their preferences shift over time.
To read the graph, the horizontal signs tell us which categories we have (here, the different ages 1 to 5) and then the vertical component tell us how many members of that category (e.g. how many one yearolds) in accordance with the key to the right. Thus, the leftmost blue bar tells us how many 1 yearolds have "Mighty Man" as their favorite superhero. The rightmost yellow bar tells us how many 5 yearolds have "Valiant Vanessa" as their favorite hero and so on.
Example 2
Which superhero gains the most fans as the children mature from being 1 yearsold to 5 yearsold?
Example 3
Which superhero has the most supporters overall (i.e. across all the age groups)?
Example 4
Which of the following tables goes with the given bar graph:
A.
B.
C.
D.
Practice Problems
 Which of the following tables is compatible with the given graph?
A.B.
C.
D.
 A new kid, Richard, joins the class. Richard is 4'8. Given that the graph below accurately depicts the heights of Richard's classmates, what is the maximum possible number of classmates that are taller than Richard? What is the minimum possible number?
 In the below chart, what percentage of threeyear olds take Tornado Terry to be their favorite hero?
 In the below chart, which superhero loses the most fans as children age from 1 to 5?
 In the below chart, what if anything can you conclude about the median height of the class?
In this post, we’ll define the mean, median, and mode. Along the way, we’ll work through an example of how to find each of them. We’ll also talk about why the median is responds less to outliers than the mean does, a fact which can sometimes be crucial in solving a GRE problem.
Finding the Mean
The mean (or average) of a variable is found by adding up all of the values of that variable and then dividing by the number of observations. So to go back to our lemonade example:
Finding the Median
Now, finding the median is a little trickier. Imagine lining up all of the values for your variable from least to greatest. Then, the median is the one in the middle. So for example, if we have the data:
We order it from least to greatest to get:
And then we pick the middle value, namely .
Sometimes, if there is a long string of values, it will be hard to see which value is the middle one. So take the data:
We order it from least to greatest:
And then we simply cross off the numbers at the end, both the leftmost and rightmost observation, to get:
1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 6, 7, 8, 9
And we repeat:
1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 6, 7, 8, 9
Until we get:
1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 6, 7, 8, 9
And thus the median is 3.
Sometimes we will have an even number of observations. So suppose we have the data:
Crossing through the numbers on the end, we get:
1, 2, 3, 4
But now, it seems, we are stuck. For the median is a single number  not two numbers! And yet if we cross out any more, we will eliminate all of our numbers.
In these cases, we say that the median is the average of the middle two numbers. Thus, the median in the above case is
So in summary, the median is either the middle number or, if you have two middle numbers, the average of those two middle numbers. Now, we try to find the median for our lemonade example:
Finding the Mode
Finally, the mode is the value that occurs most often. So, if you have the data:
Then the mode is .
Now, a set of numbers can have more than one mode, as in this example:
Here, since both and occur three times (and no other number occurs three or more times), they are both modes.
Often, to find the mode it can help to put the numbers in ascending order (so you can see how often certain values are repeated). To return to our lemonade example:
Median is More Resilient to Outliers
Finally, one property of the median is that it is "more resilient to outliers." What does this mean? First, let's be clear on what an outlier is: an outlier is a data value that is very far from many of the other observations. So suppose you have some data on how many days in a month it rains:
But one month, you have constant downpours and so you get as a new observation. Your new data is:
But the is quite far from all of the other observations. If we were to graph this data on a boxplot:
We would see that sticks out like a sore thumb. That's how you know it is an outlier. (There are more precise definitions of an outlier, but you won't need them for the GRE).
Now, when we say that the median is "more resilient" to outliers than the mean, we are saying that if we add an outlier to our data, the median is affected less than the mean. In other words, the change to the median will be less than the change to the mean.
Let's confirm that this is the case in our above example. To do so, we will compute the old mean/median, the new mean/median and compare them.
The old mean is:
The old median is: since both 14 and 15 are in the middle of
The new mean is: .
The new median is: .
So we can see that the new mean is higher than the old mean, whereas the median only increased by . Thus, the median changed less than the mean did, exactly as predicted.
Why is the Median More Resilient?
Now, you might wonder why the median is more resilient than the mean to outliers. You won't need to know this for the GRE, but it might help in remembering which one is more resilient.
Here's why. When you add a new value to the data set, the median doesn't really care how large/small the new number is. All that matters is whether it is larger than the previous median or smaller. If it is larger than the previous median, then the new median moves a number to the right. If it is smaller, the new median moves a number to the left. But the mean does care about how large/small the new number is. So if the new number is an outlier, then it is, by definition, really large or really small compared to the other numbers. So the mean responds a lot to this new number whereas the median just moves a number to the right/left. That's why outliers affect the mean more than they do the median.
Practice Problems
 Suppose we have the following heights of a kindergarten class (in inches): 29, 31, 33, 33, 37, 28, 29, 30. Find the mean, median, and mode of this data.
 Suppose the ruler we used to measure everyone's height was mislabeled: in fact, everyone is five inches taller than the ruler suggests. Take the data in question 1, adjust for this fact, and find the new mean, median, and mode.
 A new kid who is remarkably tall (60 inches) joins the class. Which statistic is more affected by this change, the median or the mean?
In this post, we'll talk about ranges, quartiles, and percentiles. These are all ways of getting a sense of what the overall distribution looks like and what the possible outcomes look like.
Let's start with the range. The range is the difference between the largest and the smallest value in your data. So suppose you have the following data:
Then, the range is . Thus, the range tells you the interval over which your data is distributed.
Quartiles are more complicated. As the name suggests, quartiles are a way of dividing up the data in to four parts. So, if we have the following data:
The quartiles are the points that divide up the data in four segments, each with the same number of observations. For the above data, we would get:
You can see how in each section, we have exactly three points which is one fourth of our total data (made up of 12 points).
More formally, the second quartile is the median. Then, the first quartile is the median of all the values that are less than the median (or second quartile). The third quartile is the median of all the values that are greater than the median.
Now how can we find the quartiles?
Well, we already know how to find the second quartile since that's just the median. After finding the median, separate the data into two halves: the lower half is made up of all the values smaller than our median, whereas the higher half is all the values greater than our median.
Then, find the median of both halves. The median of the lower half will be the first quartile, and the median of the upper half will be the second quartile value.
This makes some intuitive sense since 25% is exactly at the middle between 0 and 50%, which is the data that makes up our lower half, whereas 75% is the middle of 50% and 100%, which is the data that makes up our upper half. So if we were looking for a process to split the data into four parts, each containing 25% of our observations, this method (to find the quartiles) would be an excellent candidate for that task.
Finally, the interquartile range is just the value of the third quartile minus the value of the first quartile.
Example 1
Find the first, second, and third quartiles of: 21, 13, 37, 45, 5, 1, 9, 17, 33, 41, 25, 29. Then find the interquartile range.
Practice Problems
 Find the first, second, and third quartiles of: 30, 22, 90, 43, 28, 65, 2, 12. Then, find the interquartile range. Also, find the range.
 Find the first, second, and third quartiles for: 111, 210, 291, 240, 287, 534, 323, 222, 401. Then, find the interquartile range. Also, find the range.
 For the following numbers, is the interquartile range larger than the range? 22, 12, 32, 53, 46, 23, 21, 19, 35, 42, 31, 21, 24
In the next few posts, we will introduce different kinds of graphs. These graphs allow you to visually represent data in ways that emphasize different aspects of the data. In preparation, let's look at some data represented with different graphs:
Tables of Lemonades Sold
$$\begin{center}
\begin{tabular}{ cc }
\hline
Date & Lemonades Sold\\
\hline 7/2/19 & 1 \\
\hline 7/3/19 & 2 \\
\hline 7/4/19 & 5 \\
\hline 7/5/19 & 2 \\
\hline 7/6/19 & 3\\
\hline 7/7/19 & 2\\
\hline 7/8/19 & 5\\
\hline 7/9/19 & 4 \\
\hline 7/10/19 & 3\\
\hline 7/11/19 & 3\\
\hline
\end{tabular}
\end{center}
$$
Here are some of the difference kinds of graphs you may encounter:
Bar graph:
Circle graph:
Box plot:
Scatterplot:
In subsequent posts, we will talk more about how to read each of these graphs.
Talk of "data" is ubiquitous  what does that word mean in the context of the GRE? We will think about data as observations of given variables. Now what does that mean?
Well suppose we are trying to help our child increase her earnings from a lemonade stand. Some days, she sells a lot and some days she sells almost nothing. To figure out why, we might start keeping track of how much she sells on a given day. So we would start making a table that looks like:
Now, each of the rows in the table is an observation. And we have our variables at the top of the columns: the date and the number of lemonades sold. And more generally: variables are just the characteristics that we keep track of, while an observation is some particular occasion when we record the values of our variables.
Now, a very simple way to keep track of data is via a frequency distribution. This is a table that records, on the lefthand side, possible values of a given variable, and on the righthand side, it records how often those values appeared. Applied to the above data, we would get:
This table answers questions like "How often did my child sell three lemonades?" To find the answer, we go to the "Lemonades Sold" column and look for the row with three lemonades sold. In that row, the right hand column (corresponding to the number of days) says one. So there was one day where the child sold three lemonades.
And, in addition to a frequency distribution, we can also create a relative frequency distribution which records, on the lefthand side, possible values of a given variable, and on the right handside, the percentage of all observations where that value occurred. So, in the above example, we would get:
since the total number of days is and 20% and 40% and so on through the table.
Now, to really get a handle on what is driving her lemonade sales, we should probably add some more variables (e.g. daily temperature, day of week) and collect some more observations:
Here are some practice problems on the above concepts:
Practice Problems:
 Using the above data, construct a frequency table that tells you how often a certain number of lemonades was sold:
 Using the above data, how many lemonades were sold on Tuesdays?
 Using the above data, how many days had more than 3 lemonades sales?
 On days where the temperature was at least 73 degrees, how many lemonades did she sell on average?
 On days where the temperature was less than 73 degrees, how many lemonades did she sell on average?
In the next few posts, we’ll unpack some of the statistical terms that the GRE throws around. These terms can be seen as ways to answer two central questions:
 What’s Typical?
 What’s Possible?
In response to the question of "What's Typical?" we can find the median, mean, or mode of the data. These give you a sense of what the average value is, or what the most common value is; in short, they describe what a typical value might be.
And in answering "What's Possible?" we can find the range, quartiles, or percentiles of the data. These give you a sense of the possibilities and how the possibilities are distributed.
In later posts, we will give actual definitions of these concepts. But first, some overarching advice about learning these concepts.
There are four main ways you’ll get tested on these concepts.
 Given some data, find the value of one of these concepts (e.g. the mean, median, range, etc.).
 Given some data, alter the data in some way (e.g. by adding 10 to every value), and then find the value of one of these concepts.
 Given some graph/chart, estimate the value of one of these concepts.
 Answer some question about some concept's general properties (e.g. that the interquartile range is always less than or equal to the range).
Thus, you want to understand the formal definition well enough to do questions of the first and second types. But you also want to have some intuitive sense for the concepts, so that you can do questions that fall in the third and fourth categories.
Around 20% of GRE questions involve "data analysis." These questions typically involve looking at data/graphs/tables in order to find certain values, like the median or percent change. Other questions ask about important statistical concepts, like the normal distribution.
All the posts in this series can be found here:
 What is Data? Variables and frequencies
 Statistical Concepts
 Reading Graphs: An Introduction
 Normal distributions
Content for other parts of the math GRE can be found here:
For reasons known only to ETS, number line problems are categorized with geometry. Such problems are basically algebra problems: they present you with some diagram or some facts about certain variables and then ask whether some equations or inequalities could be true, given certain relationships among the variables. Here is a straightforward example:
Example 1
x y
A. is greater than
B. is greater than
C. The two quantities are equal
D. It cannot be determined which, if any, is greater.
Now, a number line simply gives all the numbers from least to greatest. So the numbers to the right of the number line are greater than those to the left. And if there are markings on the number line, then you may (provided the question says nothing to the contrary) assume that the markings are evenly spaced. What does this mean? Well in the following diagram:
you know that the distance between and is only half of the distance between and . Using algebra, we could express this as:
Now, how do we solve number line problems? It will depend on what the question asks for. Some questions ask for which of the following could be true, whereas others ask about which of the following must be true.
If the questions asks which equations could be true, then we are looking for either a set of numbers that makes the equation true and is compatible with the diagram/given information, or we want to show that the equation is somehow inconsistent with our diagram/given information.
If the questions asks which equations must be true, then we are looking for either a set of numbers that is compatible with the diagram/given information but makes our equation false, or we want to show that the equation somehow follows from the diagram/given information.
Whether you should look for a specific example or some proof of the consistency/inconsistency of the given equation is a judgment call that you need to make in the moment. It's hard to give general principles about when to do which, but after doing some practice problems and just thinking about the equation at hand, you should develop some sense of how to decide if an equation is consistent or not. For example:
Example 2
Let and . Must the following be true?
After doing some practice problems, you get a sense of which numbers to consider when you come to one of these problems. And by running through those numbers quickly, you can often show that some equation either can be satisfied (which answers whether the equation could be true) or can be violated (which answers whether the equation must be true).
The only real piece of advice I have here is twofold: First, always try easy, concrete numbers. Don't just think about "one negative and one positive value." It's a lot more general to think that way, but also a lot harder to evaluate. Try, instead, or and . Nice, easy numbers to calculate. And if you want a small number, try or Second, it is easy to think about what happens if both are normal, natural numbers (e.g. 1 and 2). But also think about what happens if they are small, positive numbers (e.g. and ) or only one is positive (e.g. 1 and 1) or both are negative (e.g. 1 and 2). If the equation works (or is always violated) in all of those cases, then probably it's always true (or always false).
When it comes to proving that some equation is either always true/false given the diagram/information in the question, look for easy inferences. And if you can't see a way to prove it quickly, consider just flagging it and coming back. If you've gone through a few diverse examples and they all come out the same way, probably the equation actually is always true (or always false) and the time spent confirming that might be better spent on other problems.
Practice Problems:
1. The markings on the below number line are evenly spaced:
Which of the following must be true (select all that apply):
A.
B.
C.
2. Suppose we know and Which of the following must be true (select all that apply):
A.
B.
C.
3. Suppose we know Which of the following must be true (select all that apply):
A.
B.
C.
D.
E.
4. The markings in the below number line are equally spaced. Which of the following could be true:
A.
B.
C.
5. The markings in the below number line are equally spaced. Which of the following must be true:
A.
B.
C.