Module #5 - Random Variables & Probability Distributions


This week's topics covered discrete random & continuous random variables and probability distributions; including discrete, binomial, continuous and normal distributions.




Question # 1 : Consider a population consisting of the following values, which represents the number of ice cream purchases during the academic year for each of the five housemates: 8, 14, 16, 10, 11


  1. Compute the mean of this population.

  > x <- c(8, 14, 16, 10, 11)
  > mean(x)
  [1] 11.8
          
  1. Select a random sample of size 2 out of the five members. See the example used in my Power-point presentation slide # 13:

Possible Samples
8, 14 22 / 2 = 11
8, 16 24 / 2 = 12
8, 10 18 / 2 = 9
8, 11 19 / 2 = 9.5
14, 16 30 / 2 = 15
14, 10 24 / 2 = 12
14, 11 25 / 2 = 12.5
16, 10 26 / 2 = 13
16, 11 27 / 2 = 13.5
10, 11 21 / 2 = 10.5

  > y <- sample(x,2)
  > y
  [1] 16  8


  1. Compute the mean and standard deviation of your sample.

  > y <- sample(x,2)
  > y
  [1] 16  8
  > mean(y)
  [1] 12
  > sd(y)
  [1] 5.656854


  1. Compare the mean and standard deviation of your sample to the entire population of this set (8, 14, 16, 10, 11).

  x̄ = 12
  s = 5.656854

  μ = 11.8
  σ = 3.193744




Question # 2 : Suppose that the sample size n = 100 and the population proportion p = 0.95.

  1. Does the sample proportion p have approximately a normal distribution? Explain.

I am unsure how to answer this question. To my knowledge, the steps involved would require:
  • Taking 100 random samples of size 2 from the array of 5 discrete values
  • Graphically displaying the results in a histogram
  • Comparing this visually to the bell-curve of a normal distribution
I do not know how to effectively accomplish this in R, but would hypothesize the distribution as not normal, due to the fact that it would be not be a continuous distribution.


  1. What is the smallest value of n for which the sampling distribution of p is approximately normal?

I am unsure how to accomplish this using R, or otherwise. I would assume that no possible value of n would lead to a normal distribution, since the frequency distribution of the discrete values would not be continuous.