STAT 5301: Midterm 1 (40 points)Friday, June 5, 2026 Instruc…
STAT 5301: Midterm 1 (40 points)Friday, June 5, 2026 Instructions: This is a closed book exam, only TWO double-sided note sheet in A4 or US letter size allowed. You are allowed to use a calculator and a tablet to prepare your solutions or a cellphone to compile images of your physical solution sheets into a pdf. Show all work and justify your answers sufficiently in order to receive full credits. You must remain in view of Honorlock during the exam. You may not close the exam at any point during the exam. The only website you are allowed to access during the exam is https://login.microsoftonline.com/to access your OSU OneDrive. However, it is recommended to upload a single pdf of solutions directly to your OSU OneDrive from your tablet or cellphone for easy access through Carmen. Your device (tablet or cellphone) should remain in full/partial view when in use for the duration of the exam. Reach out via email IMMEDIATELY should any issues arise during the exam Submit ONE pdf with ALL solutions. The 4 questions in this exam are below. Suggested time is about 20-25 minutes per question. Question 1. (10 points) For 32 used cars listed for sale, data is available on the number of inquiries from buyers (Inquiries), as well as on the mileage of the car (Mileage, in thousands of miles). Note the following summary statistics of these two variables: Inquiries: mean: 11.2 standard deviation: 3.1Mileage: mean: 60 standard deviation: 18 For planning purposes, researchers fit a model predicting the number of inquiries from mileage. Below is the data displayed together with the regression line. (4 points) Describe the relationship between mileage and the number of buyer inquiries. Be sure to discuss the nature of the relationship, the direction and strength of the association, and whether there are any unusual observations. (2 points) The slope of the regression line is -0.045. What is the interpretation of this value? Does it make sense to interpret it in this problem? (2 points) The intercept of the line is 13.9. What is the interpretation of this value? Does it make sense to interpret it in this problem? (2 points) The R2 is 0.04. Calculate and interpret the correlation between mileage and the number of inquiries. Question 2. (10 points) Researchers surveyed one hundred students at a large midwestern university about their exercise and sleep habits. The numbers of students who exercise at least 20 minutes per day (below, “Exercise?”) and who sleep at least 7 hours per night (below, “Enough Sleep?”) are cross-classified below: Exercise Yes No Enough Sleep Yes 30 30 No 16 24 (3 points) If we draw a student at random from these one hundred students, what is the conditional probability that the student exercises, given that he or she does not get enough sleep? (2 points) If we draw a student at random from these one hundred students, are the events “the student does not get enough sleep” and “the student exercises” independent? Justify your response. (3 points) The students in the survey were a simple random sample from the entire population of 20, 000 students at the university. Explain what a simple random sample is. Propose an approach that the researchers could use to create this simple random sample. What difficulties might they face in implementing your approach? (2 points) Given that the students in the survey were a simple random sample, can you infer a causal relationship between sleeping and exercise? If so, describe the nature of the relationship. If not, explain why not. Question 3. (10 points) A group of education researchers is studying the final exam scores (out of 100) of students in two different instructional formats: In-Person Classes and Online Classes. The data for each format is visualized using both boxplots and histograms. The boxplot for in-person classes shows a symmetric distribution with the median near the center and relatively short whiskers, suggesting consistent performance. The boxplot for online classes shows a right-skewed distribution with a longer upper whisker and the median closer to the lower quartile. The histogram for in-person classes appears approximately bell-shaped, while the histogram for online classes shows a concentration of students with lower scores and a tail extending toward higher values. The researchers collected exam scores from 30 students in each format. Summary statistics are as follows: Format Mean Score Median Score Standard Deviation IQR Skewness In-Person 78 77 6 10 symmetric Online 72 70 12 10 right skewed (3 points) (1) Provide a quick sketch of the boxplots and histograms based in the information provided. (2) Compare the centers, variability, and skewness of exam scores between in-person and online classes. (3) What do these visualizations suggest about student performance and consistency in each format? (2 points) Based on the summary statistics, discuss the differences in centers and spread between the two instructional formats. (1 point) Given the skewness in the online class distribution, would you recommend using the mean or median to compare the two formats? Justify your answer. (2 points) A student suggests using a pie chart to display the distribution of exam scores across all 60 students. How could the student construct a pie chart from this data (2 points) Continuing part D above: critically evaluate this suggestion. That is: explain why is a pie chart not the most suitable for this type of data? What potential misunderstandings might arise? Question 4. (10 points) Suppose we want to investigate the effect of fertilizer type on the yield of tomato plants. We have 180 individual tomato plants that are to be randomly assigned to equal-sized groups with fertilizer treatments labeled as follows: F1, F2, F3, F4, and F5. We are only interested in determining which fertilizer results in the highest average yield. All other environmental factors such as sunlight, water, and soil conditions are controlled and kept constant for all plants. Answer the following questions: (3 points) How many groups should we set up in this experiment? Then describe the steps to carry out a completely randomized design. (4 points) Now suppose that 2/3 of the plants are grown in a greenhouse and 1/3 are grown outdoors. These environments may influence plant growth. Describe the steps to carry out a blocked experimental design that ensures a balanced distribution of plant environments across all treatment groups. Be sure to state how many plants from each environment are in each group. (3 points) Explain why blocking by environment (greenhouse vs. outdoor) is important in this experiment. What problem does blocking help to address?
Read Details