Example Lesson Scenario: Statistics

Example Lesson Scenario: Statistics

Scenario: Sam is teaching an introductory course on Statistical Inference to 150 students. It is early in the semester, and students have learned basic statistical concepts such as mean, standard deviation, sample size, normal distributions, and skewed distributions.

In today’s class, Sam will focus on the Central Limit Theorem. Students will use an online Central Limit Theorem simulator to make observations and derive their own conclusions about the theorem. Poll Everywhere will be used to help guide students through this investigation and facilitate discussions.

By the end of the 80-minute lesson, Sam would like his students to be able to do the following:

Infer the relationships between the shape of a population distribution and its mean and standard deviation.
Differentiate the characteristics of sampling distributions based on different sample sizes.
Hypothesize the definition of the Central Limit Theorem based on observations made using an online Central Limit Theorem simulator.
Estimate the lowest sufficient sample size for the Central Limit Theorem to be applied under the context of different population distributions.
Recognize that the mean of the sampling distribution approximates the population mean.

To start, Sam wants his students to think about what they consider to be a sufficiently large sample size for the Central Limit Theorem to be applied under different contexts. This example contains 8 poll questions and uses the following poll types:

5 x Q&A
1 x Open-ended
1 x Multiple Choice
1 x Clickable Image

Sam explains that for the majority of this lesson, everyone will work in groups of 2 or 3. Each group will have the option of adding their own group’s response and/or upvoting another group’s response to the Q&A polls. One student will need to be elected by the group to submit the responses and upvote other group responses. This is to limit the number of duplicate responses and allow for the higher-quality responses to move to the top of the list for the polls.

Investigation of Population Distribution (10 minutes)

For the start of the investigation, Sam asks students to simply play around with the population parameters on the simulation. Students are given 5 minutes to use the simulation to choose different distributions and experiment with different means and standard deviations. In that time, they must also respond, as a group, with an observation in the following poll:

Question Type: Q&A
Cognitive Process: Understanding
Question: After playing with the population parameters from the simulation, what is one observation your group has made? If you see another group has already made the same observation, upvote it instead.Rationale: While the goal of this activity is to have students familiarize themselves with the simulation, students can also determine how the shape of a distribution around its mean is related to its standard deviation. This question asks students to make inferences based on what they observed.

Rationale: While the goal of this activity is to have students familiarize themselves with the simulation, students can also determine how the shape of a distribution around its mean is related to its standard deviation. This question asks students to make inferences based on what they observed.

After closing the poll, Sam reviews the responses and finds that the following responses had the most votes:

“Whenever we changed the mean, we saw that the distribution curve had simply moved to where the mean was placed.”
“When the standard deviation increased, it spread the data out over a wider range. The apex of the distribution was also lowered.”

Sam points out a more specific response that received a high amount of votes: “For a skewed population distribution, when the standard deviation increased, the end of the short tail would only spread away from the mean a very small amount, while the long tail would spread much more away from the mean.”

Sam asks the class why this was a popular answer. One student notes that it was a less obvious, but still interesting observation. It shows the link between how much a distribution is spread out around the mean and its standard deviation. Another student says that she believes the reason that the long tail was spread out more was because there was less of a concentration of data in one spot on that side of the mean, so when you spread that data out from the mean more, the standard deviation increases. Sam facilitates this discussion toward a consensus that everyone could agree with and wraps the discussion up so as to start the next part of the investigation.

Investigation of Sampling Distributions (25 minutes)

For the first part of this investigation, students will explore the shape of the sampling distribution for different sample sizes in relation to the population distribution. For the purpose of this exploration, the parameter for “Number of Samples” in the simulation must be set to 1000. The groups will be given 10 minutes to play with the parameters that aren’t fixed and to come up with observations.

Question Type: Q&A
Cognitive Process: Understanding
Question: Start with a sample size of 1. Try out different population means and standard deviations for different population distributions. What are some observations your group has found? If you see another group has already made the same observation, upvote it instead.Rationale: Similar to the previous poll question, this question asks students to start on a new investigation and asks student groups to make an inference based on their observations.

Rationale: Similar to the previous poll question, this question asks students to start on a new investigation and asks student groups to make an inference based on their observations.

Sam reviews the responses with the students and notes that the following had the highest number of votes:

“The sampling distribution looks similar to the population distribution.”
“An increase in standard deviation of the population also increases the standard deviation of the sampling distribution.”
“The shifting of the population mean shifts the mean of the sampling distribution in the same way.”

A discussion about the responses is put off for now as Sam would like his students to continue the investigation and build on further observations based on new data found in the next poll. However, he does take a moment to clarify and reinforce to students that the sampling distribution is the distribution of sample means.

This time, students receive 5 minutes to work in their groups and respond to the next poll.

Question Type: Q&A
Cognitive Process: Analyze
Question: Now set the sample size to 5. Try out different population means and standard deviations for different population distributions. What are some differences you can see from when the sample size was set to 1? If you see another group has already made the same conclusion, upvote it instead.Rationale: While this question is asking students to make inferences off the data, it is also asking students to differentiate them from the inferences made in the previous poll. By asking students to distinguish between the sample sizes of 1 and 5, students begin to identify what information is not relevant in the investigation and which patterns are emerging in the greater context of the investigation.

Rationale: While this question is asking students to make inferences off the data, it is also asking students to differentiate them from the inferences made in the previous poll. By asking students to distinguish between the sample sizes of 1 and 5, students begin to identify what information is not relevant in the investigation and which patterns are emerging in the greater context of the investigation.

Sam notes the following differences had the highest number votes.

“The distribution of sample means still looks similar to the population distribution. However, for skewed distributions, the data seems to be getting more centered towards the mean.”
“The standard deviation of the sampling distribution seems to have shrunk for the same parameters for when the sample size was 1.”
“The skewed distribution at a standard deviation of 1 looks almost like a normal distribution.”

The last of these top 3 voted responses doesn’t specify which distribution the student group is referring to. So Sam takes a moment to clarify the concept, noting that the skewed distribution is referring to the population distribution, and that the response is saying the distribution of sample means looks almost normal. Sam— attempting to make sure a possible misconception is addressed— re-emphasizes to students to be clear in what they are referring to in their responses.

Having Discussions and Forming Conclusions (25 minutes)

Sam then asks the class if they see a pattern, gathering nods from many of the students who seem to agree. He explains that for the next poll, students will be given 5 minutes to discuss and then predict what the sampling distribution will look like for a sample size of 100. They can continue to use the simulator to test out different sample sizes, except they can’t set the parameter to 100 itself.

Question Type: Q&A
Cognitive Process: Understand
Question: Without entering a sample size of 100 into the simulator, as a group, what do you predict the sampling distribution will look like when the sample size is 100? If you see that another group has already made the same response, upvote it instead.Rationale: At this point in the investigation, enough data has been gathered and differentiated for sample sizes 1, 5, and possibly a few others to be able to understand how sample size affects the sampling distribution. With that in mind, students should be able to explain what would happen at a sample size of 100 without running that parameter in the simulation. The responses will allow Sam to see how students understand their exploration of the simulation so far.

Rationale: At this point in the investigation, enough data has been gathered and differentiated for sample sizes 1, 5, and possibly a few others to be able to understand how sample size affects the sampling distribution. With that in mind, students should be able to explain what would happen at a sample size of 100 without running that parameter in the simulation. The responses will allow Sam to see how students understand their exploration of the simulation so far.

Sam looks at the responses and notes that the following answers have the highest number of votes:

“The distribution of sample means will look like a normal distribution.”
“The standard deviation of the distribution of sample means will be very small.”
“Sample means from skewed distributions will form a normal distribution.”

On the projector, Sam brings up the simulator, and demonstrates what happens to the sampling distribution at a sample size of 100. He asks students if what they are seeing is consistent with their own predictions. Many students nod their heads in agreement.

Sam finds one of the responses, with several votes from other students, to be interesting and to be a good prompt for a discussion: “All the sample means will be located at the mean of the distribution. This means that the distribution will pretty much look like a vertical line.” While these responses have been set to be anonymous, Sam asks if the group that entered it could share their reasoning for it. One student chimes in and notes how even at a sample size of 25 and a population standard deviation of 1, the distribution of sample means gets very narrow. She continues by explaining that following the trend of the investigation, as the sample size increases, the sampling distribution would get narrower until it approaches a line.

Sam explains that this is an interesting observation and takes this opportunity to postulate a theoretical question to encourage students to grapple with this idea further, “If we have an infinitely large sample size, does the distribution become a line?” The class suddenly goes into a whole class discussion for the next several minutes, with Sam facilitating and addressing potential misconceptions. He eventually interjects and explains that the distribution would never be a vertical line because of the randomness of the data.

Sam now asks students to spend 5 minutes with their groups one last time to hypothesize and enter a definition for the Central Limit Theorem.

Question Type: Q&A
Cognitive Process: Create
Question: Based on the observations and conclusions that we have made in your groups and from our whole group discussions, what does your group hypothesize that the definition of the Central Limit Theorem might be? Upvote any responses you agree with.Rationale: At this point in the investigation, students have gained a deeper understanding of how the sample size affects the sampling distribution. This question now asks students to take all the information they have gathered and use it to generate a hypothesis for the definition of the Central Limit Theorem. The goal is to have students arrive at the definition on their own based on their investigation.

Rationale: At this point in the investigation, students have gained a deeper understanding of how the sample size affects the sampling distribution. This question now asks students to take all the information they have gathered and use it to generate a hypothesis for the definition of the Central Limit Theorem. The goal is to have students arrive at the definition on their own based on their investigation.

As Sam closes the poll, he notes the following as the top voted responses:

“The Central Limit Theorem shows that when the sample size increases, the sampling distribution approaches normal regardless of the population distribution.”
“The Central Limit Theorem emphasizes that if the sample size is large enough, skewed population distributions will have resulting sampling distributions that approach normal.
“The Central Limit Theorem shows us that if you change the mean or standard deviation of the population distribution in the simulation, the mean and standard deviation of the sampling distribution will change in a similar way.”

Sam explains that these responses all have some validity under the Central Limit Theorem and reveals the textbook definition of the Central Limit Theorem as follows:

“The Central Limit Theorem states that if you have a population with mean μ and standard deviation σ, and take sufficiently large random samples from the population with replacement, then the distribution of the sample means will be approximately normally distributed. This will hold true regardless of whether the source population is normal or skewed, provided the sample size is sufficiently large.” (LaMorte, 2016)

Sam recaps his students’ observations and conclusions from the previous activities and connects them back to the definition. Students are then given the opportunity to ask any questions that they may have at this point with regards to the definition.

Application of Theorem and Knowledge Check (15 minutes)

Now that the investigation is complete, Sam now wants to figure out if students can independently apply and demonstrate their understanding of the theorem. Sam tells his students that for the remaining polls in this class session, students will now work and respond to the polls individually. To start, Sam wants his students to think about what they consider to be a sufficiently large sample size for the Central Limit Theorem to be applied for different population distributions. He gives his students just 2 minutes to finish the following poll.

Question Type: Multiple Choice
Cognitive Process: Apply
Question: Given the following population distribution, what do you believe would be considered a sufficiently large sample size so that the distribution of sample means will be normal?

Rationale: This question narrows in on the “sufficiently large” sample size part of the definition. Students will need to use both their understanding of the definition as well as the data from the simulator to figure out the best number for the given situation.

Most of the students take a few minutes to play on the simulator before choosing 1 or 5. Sam asks students to remember the results and immediately moves to the next poll, in which he also gives students about 2 minutes to finish:

Rationale: Similar to the previous poll, this question asks students to use their understanding to estimate the number. The two polls show a stark contrast with each other: the first poll starts with a normal population distribution which means that the sample size can be incredibly small. However, the population distribution in this poll is heavily skewed in one direction, showing clearly how 30 (the rule of thumb) would be more appropriate for the sample size.

In this case most students choose answers between 25-50. Sam then poses this question: “Why do you think that a skewed population distribution requires a larger sample size than the normal population distribution in order to get the sample means to be normally distributed?” He opens this up to a whole class debate with students having responses such as:

“When the population is normally distributed then all the values are already centered around the mean so the sample means will be too. So your sample size can be quite small.
“When you randomly choose values from a distribution, the more values you choose, the more likely that more values will be chosen near the mean. That is why the skewed population distributions need a larger sample size.”

Sam facilitates this whole group discussion and helps students arrive at the correct answer. It is noted that if a population is already normally distributed, then regardless of the sample size, the distribution of the sample means will be as well. The rule of thumb is that a sample size of 30 or more would be sufficient for most distributions and in the context of the population distributions that the simulation offers a sample size of 25 seemed sufficient.

To finish, a quick 2-minute knowledge check is given to students with the following poll:

Question Type: Clickable Image
Cognitive Process: Understanding
Question: Given a left-skewed population with a mean of -3, standard deviation of 2, and the sample size being 25, where on the number line do you believe that the mean of the sampling distribution would most likely be? Respond by clicking/tapping on the location on the number line.

Rationale: In the lesson, Sam doesn’t explicitly mention that the mean of the sampling distribution will approximate the mean of the population distribution. However, the investigation should have led students to that conclusion and it was mentioned in passing in whole group discussions. In this question, Sam is doing a knowledge check to make sure students have figured this aspect out. It is not in the definition of the theorem, but is an important aspect that students should know.

Sam checks and sees that most of the students had placed their answer at around -3. This is the correct answer and it is explained to students that as the sample size increases, the mean of the sampling distribution will approach the value of the population mean.

Reflection on Learning (5 minutes)

Sam ends the lesson with a ticket-out-the door poll asking students to reflect on the lesson.

Question Type: Open-ended
Cognitive Process: Analyze
Question 1: What is your main takeaway from today’s class session?
Question 2: What is one question you have from today’s class session?Rationale: These two questions require students to reflect on the class lesson to determine what their main takeaway is and what questions they still have. This is an opportunity for students to practice their metacognitive skills to understand what and how well they learned from the class session, while the responses give the instructor insights into what students are thinking.

Rationale: These two questions require students to reflect on the class lesson to determine what their main takeaway is and what questions they still have. This is an opportunity for students to practice their metacognitive skills to understand what and how well they learned from the class session, while the responses give the instructor insights into what students are thinking.

After class, Sam reviews the open-ended responses to look for what students are taking away from his lesson, and whether these align with his learning objectives for his students. He notes the main points of the lesson that most students did not mention so he can highlight them again later. Sam also scans through the list of student questions and identifies those that multiple students bring up. He compiles these questions in an announcement on CourseWorks (Canvas), includes his own responses and resources (specific pages of textbook, link to article, etc.) for students to follow up on, and reminds students to attend his office hours if they still have questions.

References

LaMorte, W. W. (2016, July 24). Central Limit Theorem. Retrieved from https://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Probability/BS704_Probability12.html