[latexpage] Talk of "data" is ubiquitous -- what does that word mean in the context of the GRE? We will think about data as observations of given variables. Now what does that mean?
Well suppose we are trying to help our child increase her earnings from a lemonade stand. Some days, she sells a lot and some days she sells almost nothing. To figure out why, we might start keeping track of how much she sells on a given day. So we would start making a table that looks like:
$$\begin{center}
\begin{tabular}{ |c|c|c| }
\hline
Date & Lemonades Sold \\
\hline 7/2/19 & 1 \\
\hline 7/3/19 & 2 \\
\hline 7/4/19 & 5 \\
\hline 7/5/19 & 2 \\
\hline 7/6/19 & 3 \\
\hline
\end{tabular}
\end{center}
$$
Now, each of the rows in the table is an observation. And we have our variables at the top of the columns: the date and the number of lemonades sold. And more generally: variables are just the characteristics that we keep track of, while an observation is some particular occasion when we record the values of our variables.
Now, a very simple way to keep track of data is via a frequency distribution. This is a table that records, on the left-hand side, possible values of a given variable, and on the right-hand side, it records how often those values appeared. Applied to the above data, we would get:
$$\begin{center}
\begin{tabular}{ |c|c| }
\hline
Lemonades Sold & Number of Days\\
\hline 1 & 1 \\
\hline 2 & 2 \\
\hline 3 & 1 \\
\hline 4 & 0 \\
\hline 5 & 1 \\
\hline
\end{tabular}
\end{center}
$$
This table answers questions like "How often did my child sell three lemonades?" To find the answer, we go to the "Lemonades Sold" column and look for the row with three lemonades sold. In that row, the right hand column (corresponding to the number of days) says one. So there was one day where the child sold three lemonades.
And, in addition to a frequency distribution, we can also create a relative frequency distribution which records, on the left-hand side, possible values of a given variable, and on the right hand-side, the percentage of all observations where that value occurred. So, in the above example, we would get:
$$\begin{center}
\begin{tabular}{ |c|c| }
\hline
Lemonades Sold & Number of Days\\
\hline 1 & 20\% \\
\hline 2 & 40\% \\
\hline 3 & 20\% \\
\hline 4 & 0\% \\
\hline 5 & 20\% \\
\hline
\end{tabular}
\end{center}
$$
since the total number of days is $1 + 2 + 1 + 0 + 1 = 5$ and $\frac{1}{5} =$ 20% and $\frac{2}{5} =$ 40% and so on through the table.
Now, to really get a handle on what is driving her lemonade sales, we should probably add some more variables (e.g. daily temperature, day of week) and collect some more observations:
$$\begin{center}
\begin{tabular}{ |c|c|c|c| }
\hline
Date & Lemonades Sold & Day of Week & Temperature (Fahrenheit)\\
\hline 7/2/19 & 1 & Tuesday & 68 \\
\hline 7/3/19 & 2 & Wednesday & 73 \\
\hline 7/4/19 & 5 & Thursday & 75 \\
\hline 7/5/19 & 2 & Friday & 70 \\
\hline 7/6/19 & 3 & Saturday & 71\\
\hline 7/7/19 & 2 & Sunday & 71 \\
\hline 7/8/19 & 5 & Monday & 78\\
\hline 7/9/19 & 4 & Tuesday & 75 \\
\hline 7/10/19 & 3 & Wednesday & 72\\
\hline 7/11/19 & 3 & Thursday & 73\\
\hline
\end{tabular}
\end{center}
$$
Here are some practice problems on the above concepts:
Practice Problems:
- Using the above data, construct a frequency table that tells you how often a certain number of lemonades was sold:
- Using the above data, how many lemonades were sold on Tuesdays?
- Using the above data, how many days had more than 3 lemonades sales?
- On days where the temperature was at least 73 degrees, how many lemonades did she sell on average?
- On days where the temperature was less than 73 degrees, how many lemonades did she sell on average?