Talk of "data" is ubiquitous -- what does that word mean in the context of the GRE? We will think about **data** as observations of given variables. Now what does that mean?

Well suppose we are trying to help our child increase her earnings from a lemonade stand. Some days, she sells a lot and some days she sells almost nothing. To figure out why, we might start keeping track of how much she sells on a given day. So we would start making a table that looks like:

Date | Lemonades Sold |

7/2/19 | 1 |

7/3/19 | 2 |

7/4/19 | 5 |

7/5/19 | 2 |

7/6/19 | 3 |

Now, each of the rows in the table is an observation. And we have our variables at the top of the columns: the date and the number of lemonades sold. And more generally: **variables** are just the characteristics that we keep track of, while an **observation** is a set of values for our variables on some particular occasion.

Now, a very simple way to keep track of data is via a **frequency distribution**. This is a table that records, on the left-hand side, possible values of a given variable, and on the right-hand side, it records how often those values appeared. Applied to the above table, we would get:

Lemonades Sold | Number of Days (when that many lemonades were sold) |

1 | 1 |

2 | 2 |

3 | 1 |

4 | 0 |

5 | 1 |

This table answers questions like "How often did my child sell three lemonades?" To find the answer, we go to the "Lemonades Sold" column and look for the row with three lemonades sold. In that row, the right hand column (corresponding to the number of days) says one. So there was one day where the child sold three lemonades.

And, in addition to a frequency distribution, we can also create a **relative frequency distribution** which records, on the left-hand side, possible values of a given variable, and on the right hand-side, the percentage of all observations where that value occurred. So, in the above example, we would get:

Lemonades Sold | Proportion of Days |

1 | 20% |

2 | 40% |

3 | 20% |

4 | 0% |

5 | 20% |

since the total number of days is and 20% and 40% and so on through the table.

Now, to really get a handle on what is driving her lemonade sales, we should probably add some more variables (e.g. daily temperature, day of week) and collect some more observations:

Date | Lemonades Sold | Day of Week | Temperature (Fahrenheit) |

7/2/19 | 1 | Tuesday | 68 |

7/3/19 | 2 | Wednesday | 73 |

7/4/19 | 5 | Thursday | 75 |

7/5/19 | 2 | Friday | 70 |

7/6/19 | 3 | Saturday | 71 |

7/7/19 | 2 | Sunday | 71 |

7/8/19 | 5 | Monday | 78 |

7/9/19 | 4 | Tuesday | 75 |

7/10/19 | 3 | Wednesday | 72 |

7/11/19 | 3 | Thursday | 73 |

Here are some practice problems on the above concepts:

**Practice Problems:**

- Using the above data, construct a frequency table that tells you how often a certain number of lemonades was sold:

## Leave a Reply

You must be logged in to post a comment. You can get a free account here.