Talk of "data" is ubiquitous -- what does that word mean in the context of the GRE? We will think about data as observations of given variables. Now what does that mean?

Well suppose we are trying to help our child increase her earnings from a lemonade stand. Some days, she sells a lot and some days she sells almost nothing. To figure out why, we might start keeping track of how much she sells on a given day. So we would start making a table that looks like:

Date Lemonades Sold
7/2/19 1
7/3/19 2
7/4/19 5
7/5/19 2
7/6/19 3

Now, each of the rows in the table is an observation. And we have our variables at the top of the columns: the date and the number of lemonades sold. And more generally: variables are just the characteristics that we keep track of, while an observation is a set of values for our variables on some particular occasion.

Now, a very simple way to keep track of data is via a frequency distribution. This is a table that records, on the left-hand side, possible values of a given variable, and on the right-hand side, it records how often those values appeared. Applied to the above table, we would get:

Lemonades Sold Number of Days (when that many lemonades were sold)
1 1
2 2
3 1
4 0
5 1

This table answers questions like "How often did my child sell three lemonades?" To find the answer, we go to the "Lemonades Sold" column and look for the row with three lemonades sold. In that row, the right hand column (corresponding to the number of days) says one. So there was one day where the child sold three lemonades.

And, in addition to a frequency distribution, we can also create a relative frequency distribution which records, on the left-hand side, possible values of a given variable, and on the right hand-side, the percentage of all observations where that value occurred. So, in the above example, we would get:

Lemonades Sold Proportion of Days
1 20%
2 40%
3 20%
4 0%
5 20%

since the total number of days is 1 + 2 + 1 + 0 + 1 = 5 and \frac{1}{5} = 20% and \frac{2}{5} = 40% and so on through the table.

Now, to really get a handle on what is driving her lemonade sales, we should probably add some more variables (e.g. daily temperature, day of week) and collect some more observations:

Date Lemonades Sold Day of Week Temperature (Fahrenheit)
7/2/19 1 Tuesday 68
7/3/19 2 Wednesday 73
7/4/19 5 Thursday 75
7/5/19 2 Friday 70
7/6/19 3 Saturday 71
7/7/19 2 Sunday 71
7/8/19 5 Monday 78
7/9/19 4 Tuesday 75
7/10/19 3 Wednesday 72
7/11/19 3 Thursday 73

Here are some practice problems on the above concepts:

Practice Problems:

  1. Using the above data, construct a frequency table that tells you how often a certain number of lemonades was sold:
    Answer

Leave a Reply