This test is used in place of paired t-tests to test the null hypothesis of whether the pairwise differences are 0 or not between two paired groups. Imagine now that we have two tests and we want to see which test was more difficult. That is, we want to test

\(H_0: D_i=0 \,\) for all observations \(i\). (\(D_i\) are pairwise differences.)

The alternative would be that at least one pair is nonzero.

We are working with the following data. (This is the head() of the data.)

trials <- data.frame(cbind(trial_1, trial_2))
head(trials, 10)
##      trial_1   trial_2
## 1  1.0000000 0.7272727
## 2  0.9833333 0.9696970
## 3  0.9500000 0.8787879
## 4  0.9333333 0.9090909
## 5  0.9333333 0.8484848
## 6  0.9000000 0.8484848
## 7  0.9000000 0.9090909
## 8  0.8833333 0.9090909
## 9  0.8500000 0.9090909
## 10 0.8500000 0.7878788

Recall that Wilcoxon Rank Sign tests for whether pairwise differences are 0 or not. Therefore, like how we did in a paired t-test, we will need to take a difference between the two trials.

library(dplyr)
trials <- trials %>% mutate(d=trial_2-trial_1)
head(trials, 10)
##      trial_1   trial_2            d
## 1  1.0000000 0.7272727 -0.272727273
## 2  0.9833333 0.9696970 -0.013636364
## 3  0.9500000 0.8787879 -0.071212121
## 4  0.9333333 0.9090909 -0.024242424
## 5  0.9333333 0.8484848 -0.084848485
## 6  0.9000000 0.8484848 -0.051515152
## 7  0.9000000 0.9090909  0.009090909
## 8  0.8833333 0.9090909  0.025757576
## 9  0.8500000 0.9090909  0.059090909
## 10 0.8500000 0.7878788 -0.062121212

Visuals

Recall that we need not focus on the shape of our underlying distribution since we are using a nonparametric test. But out of interest, let’s look at the histograms and boxplots.

library(ggplot2)
library(reshape2)
ggplot(data=melt(trials) %>% filter(variable!="d"), aes(x=value, fill=variable)) + 
  geom_histogram(col="black", lwd=0.2, binwidth=0.05) +
  ggtitle("Stacked Histogram of Trials 1 and 2")

ggplot(data=melt(trials) %>% filter(variable!="d"), aes(y=value, x=variable, fill=variable)) + 
  geom_boxplot() +
  ggtitle("Boxplot of Trials 1 and 2")

Here is the histogram of the differences. I put a vertical line at \(x=0\) to show what we are testing. Are there significantly more data above or below the line? Is this statistically important?

ggplot(data=trials, aes(x=d)) + 
  geom_histogram(binwidth=0.05, col="black", lwd=0.2) +
  geom_vline(xintercept=0, col="salmon", lwd=2, alpha=0.5) +
  ggtitle("Histogram of differences")

We see that our data for the most part are wound around a center near 0.

1. The Long, Step-By-Step Way

Put differences in order of absolute value

I am using the arrange() function.

trials <- trials %>% arrange(abs(d))
head(trials, 10)
##      trial_1   trial_2            d
## 1  0.7000000 0.6969697 -0.003030303
## 2  0.9000000 0.9090909  0.009090909
## 3  0.9833333 0.9696970 -0.013636364
## 4  0.4000000 0.4242424  0.024242424
## 5  0.9333333 0.9090909 -0.024242424
## 6  0.8833333 0.9090909  0.025757576
## 7  0.7000000 0.7272727  0.027272727
## 8  0.5166667 0.5454545  0.028787879
## 9  0.8500000 0.8181818 -0.031818182
## 10 0.5166667 0.4848485 -0.031818182

Be sure to remove rows that have a difference of 0. The code to do this is:

trials <- trials %>% filter(d!=0)

Since we have no rows that have a difference of 0, the above code does nothing to our dataset.

Assign ranks with the below code

assign_wrs <- function(diffs) {
  ranks <- c()            # * CREATE EMPTY VECTOR TO SAVE RANKS
  this_rank <- 1          # * START WITH RANK 1
  
  # * FOR EACH OF THE UNIQUE VALUES OF DIFFERENCES
  for(ix in 1:length(unique(diffs))) {
    
    # * CHECK IF THERE ARE DUPLICATED DIFFERENCES
    these_ix <- which(diffs == unique(diffs)[ix])
    
      # * IF THERE ARE NO DUPLICATES, ASSIGN STANDARD RANK
      if (length(these_ix)==1) {
        ranks <- c(ranks, this_rank)
        this_rank <- length(ranks) + 1
      
      # * IF THERE ARE DUPLICATES, ASSIGN AN AVERAGED RANK
      } else {
          these_ranks <- rep(mean(c(this_rank, this_rank+length(these_ix)-1)), length(these_ix))
          ranks <- c(ranks, these_ranks)
          this_rank <- length(ranks) + 1
      }
  }
  
  # * OUTPUT RANKS
  ranks
}

# * APPLY FUNCTION USING MUTATE
trials <- trials %>% mutate(rank=assign_wrs(d))
head(trials, 10)
##      trial_1   trial_2            d rank
## 1  0.7000000 0.6969697 -0.003030303    1
## 2  0.9000000 0.9090909  0.009090909    2
## 3  0.9833333 0.9696970 -0.013636364    3
## 4  0.4000000 0.4242424  0.024242424    4
## 5  0.9333333 0.9090909 -0.024242424    5
## 6  0.8833333 0.9090909  0.025757576    6
## 7  0.7000000 0.7272727  0.027272727    7
## 8  0.5166667 0.5454545  0.028787879    8
## 9  0.8500000 0.8181818 -0.031818182    9
## 10 0.5166667 0.4848485 -0.031818182   10

Sum for negatives

trials %>% filter(d<0) %>% summarize(sum(rank))
##   sum(rank)
## 1       283

Sum for positives

trials %>% filter(d>0) %>% summarize(sum(rank))
##   sum(rank)
## 1       278

Use formulas to calculate \(\mu_T\) and \(\sigma_T\)

Calculating the average

\[ \mu_T = \frac{n(n+1)}{4} = \]

nrow(trials)
## [1] 33

\[ \mu_T = \frac{n(n+1)}{4} = \frac{33(34)}{4} \]

Calculating the standard deviation

\[ \sigma_T = \sqrt{\frac{n(n+1)(2n+1)}{24}} = \sqrt{\frac{33(34)(2*33+1)}{24}} \]

Calculating \(z\)

\[ Z_T = \frac{T-\mu_T}{\sigma_T} \]

where \(T=\text{min}\{283,278\}\), the two signed sums.

trials %>% summarize(n=n(),
                     mu_t=(n*(n+1))/4,
                     sigma_t=sqrt((n*(n+1)*(2*n+1))/24),
                     z_t=(278-mu_t)/sigma_t)
##    n  mu_t  sigma_t         z_t
## 1 33 280.5 55.96651 -0.04466957

Calculate p-value

Because our \(Z_t=-0.04457\) is negative, we can use pnorm with lower.tail=TRUE

2*pnorm(q=-0.04466957, lower.tail=TRUE)
## [1] 0.9643707

or do other equivalent calculations based on the symmetry of the normal distribution like this

pnorm(q=-0.04466957, lower.tail=TRUE) + pnorm(q=0.04466957, lower.tail=FALSE)
## [1] 0.9643707

or this,

2*(1-pnorm(q=0.04466957, lower.tail=FALSE))
## [1] 1.035629

although the first calculation is the most straightfoward.

Interpretations and conclusion

Our p-value is ginormous. Given that all the pairwise differences are 0, we would expect to observe our data or more extreme about 96.44% of the time. That is, we would expect to see data like ours 96.44% of the time if there were truly no pairwise differences between trial 1 and trial 2.

We conclude that \(H_0\) cannot be rejected. We have evidence in favor of \(H_0\) which states that the pairwise differences are centered around 0. Neither of the tests seem more difficult than the other based on our sample.

This matches our original intuition from the histograms we looked at where the data look like they fall mostly around 0.

2. The R Shortcut

Instead of doing the above, we can use this. The correct=TRUE means that we are using a normal distribution to calculate our p-value like we did above in the step-by-step (using rnorm).

wilcox.test(x=trial_1, y=trial_2, data=trials, paired=TRUE, correct=TRUE)
## Warning in wilcox.test.default(x = trial_1, y = trial_2, data = trials, :
## cannot compute exact p-value with ties
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  trial_1 and trial_2
## V = 284, p-value = 0.9572
## alternative hypothesis: true location shift is not equal to 0

The interpretation and conclusions above still stand.