Talk of "data" is ubiquitous -- what does that word mean in the context of the GRE? We will think about data as observations of given variables. Now what does that mean?
Well suppose we are trying to help our child increase her earnings from a lemonade stand. Some days, she sells a lot and some days she sells almost nothing. To figure out why, we might start keeping track of how much she sells on a given day. So we would start making a table that looks like:
Now, each of the rows in the table is an observation. And we have our variables at the top of the columns: the date and the number of lemonades sold. And more generally: variables are just the characteristics that we keep track of, while an observation is some particular occasion when we record the values of our variables.
Now, a very simple way to keep track of data is via a frequency distribution. This is a table that records, on the left-hand side, possible values of a given variable, and on the right-hand side, it records how often those values appeared. Applied to the above data, we would get:
This table answers questions like "How often did my child sell three lemonades?" To find the answer, we go to the "Lemonades Sold" column and look for the row with three lemonades sold. In that row, the right hand column (corresponding to the number of days) says one. So there was one day where the child sold three lemonades.
And, in addition to a frequency distribution, we can also create a relative frequency distribution which records, on the left-hand side, possible values of a given variable, and on the right hand-side, the percentage of all observations where that value occurred. So, in the above example, we would get:
since the total number of days is and 20% and 40% and so on through the table.
Now, to really get a handle on what is driving her lemonade sales, we should probably add some more variables (e.g. daily temperature, day of week) and collect some more observations:
Here are some practice problems on the above concepts:
- Using the above data, construct a frequency table that tells you how often a certain number of lemonades was sold:
- Using the above data, how many lemonades were sold on Tuesdays?
There are two Tuesdays on record: 7/2 and 7/9. In total, over those two days, 5 lemonades were sold.
- Using the above data, how many days had more than 3 lemonades sales?
We see that there were three such days: 7/4, 7/8, and 7/9.
- On days where the temperature was at least 73 degrees, how many lemonades did she sell on average?
There were 5 days when the temperature was at least 73 degrees: 7/3, 7/4, 7/8, 7/9, and 7/11. Adding up the lemonade sales over those five days, we get 19 lemonades sold. Dividing by 5, we get 3.8 lemonades sold on average.
- On days where the temperature was less than 73 degrees, how many lemonades did she sell on average?
There were 5 days when the temperature was less than 73 degrees: 7/2, 7/5, 7/6, 7/7, and 7/10. Adding up the lemonade sales over those five days, we get 11 lemonades sold. Dividing by 5, we get 2.2 lemonades sold on average.