We wish to test the hypotheses of getting the job done. Does Bob the builder get the job done 2 times per day or not?
\[H_0: \mu = 2\] \[H_1: \mu \neq 2\]
mu_0 <- 2
We have a random sample of independent observations.
bob_data <- c(4.75, 4.4, 3.8, 5.2, 4.2, 4.7, 5.12, 4.9, 6, 2, 2.3, 1.5, 2.2, 3.8, 3.7, 6.5, 6.2)
The histogram doesn’t look too bad? We have enough data (n=17).
library(ggplot2)
ggplot(data.frame(bob_data=bob_data), aes(x=bob_data)) +
geom_histogram(binwidth=1.1, col="white", lwd=0.5) +
theme_minimal()
There is one outlier.
ggplot(data.frame(bob_data=bob_data), aes(y=bob_data)) +
geom_boxplot() +
theme_minimal()
A t-test is robust, so with caution from above, we’ll proceed.
Meaning: Use R
like it is a simple calculator.
# * THIS IS BY HAND
n <- length(bob_data)
x_bar <- mean(bob_data)
sample_sd <- sd(bob_data)
c(n=n, x_bar=x_bar, sample_sd=sample_sd)
## n x_bar sample_sd
## 17.000000 4.192353 1.495157
We are using \(t= \frac{\bar{x}-\mu_0}{s/\sqrt{n}}\). Check out what \(t\) equals.
# * THIS IS CONTINUING THE BY HAND CALCULATION
t <- (x_bar-mu_0) / (sample_sd/sqrt(n))
t
## [1] 6.045722
By definition, we have degrees of freedom as 1 minus the number of observations.
# * THIS IS CONTINUING THE BY HAND CALCULATION
df <- n-1
df
## [1] 16
Let’s see where \(t\) lands on our distribution. I am plotting a t-distribution with \(df=n-1=16\).
# * THIS IS THE T-DISTRIBUTION WE ARE COMPARING AGAINST
x <- seq(-6.5, 6.5, length=100)
hx <- dt(x, df=n-1)
t_dist <- data.frame(cbind(x,hx))
ggplot(t_dist, aes(x=x, y=hx)) +
geom_line() +
geom_vline(xintercept=t, col="cornflowerblue", lwd=2, alpha=0.7) +
xlab("All the possible t values we can calculate") +
ylab("Probability density") +
theme_minimal()
Can you guess what our p-value will be? (Big? Small?) We’re going to take the area of being above the blue line on the above distribution as our p-value. (The probability of rejecting \(H_0\) given thatn \(H_0\) is actually the truth.)
# * THIS IS CONTINUING THE BY HAND CALCULATION
p_val <- 2*(1 - pt(q=t, df=df))
p_val
## [1] 1.699117e-05
Look at slides to see interpretation of p-value!
Also, question: Would the corresponding confidence interval include or not include \(\mu_0=2\)?
Meaning: Use more than just simple R
functions.
# * THIS IS "USING R"
test <- t.test(x=bob_data, alternative="two.sided", mu=2)
test$p.value
## [1] 1.699117e-05