Project HAL was brought to my attention in “The Color of Lynching” (2011) by Lisa D. Cook available here. In Cook’s paper, she discusses a particular data collection bias that struck chills in me. I immediately Google searched after reading Cook’s paper for one of the contemporary datasets she cited by Tolnay and Beck. The curiousity I had prior to downloading the HAL data was certainly sensitive to the subject matter, but not fully thought out. Not until opening the spredsheet did I realize what I was looking at. The rows were not about subjects and their cell phone usage or flowers and their petal lengths. These data were a murder list, each victim caught in a deep, complex, and painfully twisted history.

What Cook emphasizes is that whomever analyze data on lynchings must handle it with care. The HAL dataset is not just a public record of systematic, illegal murder, but also oversimplified as a result of a clear apathy for the people who were murdered.

The HAL Dataset

The HAL dataset is a work by Stewart Tolnay and E.M. Beck, made available online by Project HAL. The data’s origins are from the National Association for the Advancement of Colored People (NAACP) Lynching Records at Tuskegee Institute. The dataset grows with contributions made from the public [2].

“Lynching” is defined by the standards of the NAACP [1]. The HAL dataset contains 2806 records of lynchings taken place in the South. According to the NAACP, between the years of 1882-1968, 4,743 lynchings took place in the United States with many left unrecorded [3]. Project HAL currently covers some of the deaths between the years of 1882 and 1930.

Below is a list of 10 of the 2806 people murdered. These 10 victims’ date and location of death, race, sex, what kind of mob targeted them, and under what offense they were murdered. Notably, race is binary – a subject is either “black” or “white” (all those who were non-black or mixed race) [1].

State Year Mo Day Victim County Race Sex Mob Offense
LA 1902 3 19 John Woodward Concordia Blk Male NA Murder
LA 1887 8 9 Thomas Scott Morehouse Wht Male NA Murder
SC 1886 5 5 Wesley Williams Kershaw Blk Male NA Attempted assault (rape)
AR 1896 4 18 Jefferson Gardner Bradley Blk Male NA Rape
AR 1899 3 22 Joseph King Little River Blk Male NA Threats against whites
AR 1896 7 30 Godfrey Gould Monroe Wht Male NA Rape
LA 1887 11 7 Unnamed Negro Caddo Blk Male NA Miscegenation
MS 1893 7 2 Unnamed Negro Harrison Blk Male NA Criminal assault (rape)
LA 1903 1 26 John Thomas St. Charles Blk Male NA Murder
FL 1887 12 12 George Green Marion Blk Male NA Theft

Per Offense

Lynching began in the late 19th century by whites in order to “protect white women” [3].

The top 10 offenses that “motivated” the lynchings are displayed below. The first offense is murder. Rape is a repeated offense that files under several labels of assault. Robbery and theft are also repeated offenses. As far as data collection goes here, I think we can do better.

To make a more informative bar plot, I can sort the offenses within the dataset by merging offenses of the same nature under the same count. The following code block shows how I filtered the data. I want to be clear that I am not trying to erase any original causes or content. This relabeling is for data uniformity for visualization purposes alone.

hal_data_filter <- hal_data
hal_data_filter$Offense <- ifelse(grepl("plot|attempt|att|attempted", tolower(hal_data_filter$Offense)), "Attempt", hal_data_filter$Offense)
hal_data_filter$Offense <- ifelse(grepl("girl's bedroom|girl's room|woman's room|woman's bedroom|chamber|lady's room", tolower(hal_data_filter$Offense)), "Girl's room", hal_data_filter$Offense)
hal_data_filter$Offense <- ifelse(grepl("white woman|white girl|women|woman", tolower(hal_data_filter$Offense)), "White woman", hal_data_filter$Offense)
hal_data_filter$Offense <- ifelse(grepl("rape", tolower(hal_data_filter$Offense)), "Rape", hal_data_filter$Offense)
hal_data_filter$Offense <- ifelse(grepl("threat", tolower(hal_data_filter$Offense)), "Threat", hal_data_filter$Offense)
hal_data_filter$Offense <- ifelse(grepl("implicated|complicity|accomplice|aided murderer|father of|brother of", tolower(hal_data_filter$Offense)), "Implicated", hal_data_filter$Offense)
hal_data_filter$Offense <- ifelse(grepl("arson|incendiarism|dynamiting|barn|incendiary", tolower(hal_data_filter$Offense)), "Arson", hal_data_filter$Offense)
hal_data_filter$Offense <- ifelse(grepl("assault", tolower(hal_data_filter$Offense)), "Assault", hal_data_filter$Offense)
hal_data_filter$Offense <- ifelse(grepl("robbery|burglary|theft|stealing", tolower(hal_data_filter$Offense)), "Robbery", hal_data_filter$Offense)
hal_data_filter$Offense <- ifelse(grepl("bad character", tolower(hal_data_filter$Offense)), "Bad character", hal_data_filter$Offense)
hal_data_filter$Offense <- ifelse(grepl("political", tolower(hal_data_filter$Offense)), "Political", hal_data_filter$Offense)
hal_data_filter$Offense <- ifelse(grepl("murder", tolower(hal_data_filter$Offense)), "Murder", hal_data_filter$Offense)
hal_data_filter$Offense <- ifelse(grepl("insult", tolower(hal_data_filter$Offense)), "Insult", hal_data_filter$Offense)
hal_data_filter$Offense <- ifelse(grepl("race", tolower(hal_data_filter$Offense)), "Race", hal_data_filter$Offense)
hal_data_filter$Offense <- ifelse(grepl("shot|shooting|shoot", tolower(hal_data_filter$Offense)), "Shooting", hal_data_filter$Offense)
hal_data_filter$Offense <- ifelse(grepl("white man|white men", tolower(hal_data_filter$Offense)), "White man", hal_data_filter$Offense)

White Offenses

Here are the top 10 offenses “white victims” (all non-blacks or mixed race victims) were lynched for.

Offense Frequency
Murder 144
Rape 26
Robbery 18
Attempt 11
Outlaw 11
Unknown 11
Arson 8
Implicated 7
Assault 5
Shooting 5

Black Offenses

Here are the top 10 offenses among black victims. I will also visualize the offenses for black victims below.

Offense Frequency
Murder 764
Rape 418
Attempt 383
Assault 162
Arson 113
Robbery 102
White woman 88
Unknown 75
Implicated 67
Race 49

Above, murder is the highest cited offense for all of the lynchings, and rape is second. The third category is called “Attempt” in which I counted all the victims that were murdered for “attempting to commit a crime”. I also have a category that has to do with “Implications” that the victim aided in criminal activity. Again, none of these murders were tried in court or “proven” to be fact (I say this with caution because I am not sure how much a trial would have helped these victims). There are many more questionable offenses throughout this database. Below are 10 more randomly picked offenses that black people were targeted for lynchings.

Offense Frequency
87 Obscene language 1
73 Inciting to riot 1
61 Dangerous character 1
61.1 Dangerous character 1
109 Wounded deputy 1
87.1 Obscene language 1
67 Flirting with wh. girl 1
75 Indecent proposals to girls 1
18 Miscegenation 4
53 Bad reputation 1

Miscegenation (interracial relations) was a repeated offense. In 1897, Andy Beard was lynched for eloping with a white woman. In 1888 and 1905, Jim Torney and Joe Woodman were also murdered under the same offense. It would not surprise me if this were the true motivation to more lynchings. Furthermore in this dataset, 13 were murdered under a mistaken identity, and 88 offenses were unknown.

Per State

Although lynching occurred in other states, most of the nation’s lynching happened in the South [3]. The HAL dataset covers 9 southern states. Using the dplyr package, I was able to aggregate data based on year and state of lynching. Here are the data for 1882.

Year State n
1882 AL 6
1882 AR 4
1882 FL 3
1882 GA 3
1882 KY 8
1882 LA 9
1882 MS 3
1882 SC 6
1882 TN 2

The following barplot shows the number of records for all of the years per state.

Over time for each state, we see that 1890 brought about more lynchings than other time periods, and some states curbed their lynching activity in later years. North Carolina overall had the fewest lynchings among the 9 states.

Per Race

In this dataset, 84.25% of lynchings were done to black males. Below is a table that shows the absolute values of lynchings (column n) and their relative frequency (column freq) to which sex they were classified under.

Race Sex n freq
Blk Fe 74 0.9367089
Blk Male 2364 0.8749075
Blk Unk 24 0.9600000
Other Male 5 0.0018505
Unk Male 49 0.0181347
Unk Unk 1 0.0400000
Wht Fe 5 0.0632911
Wht Male 284 0.1051073

Here, we see the obvious skew of lynchings toward black people.

The following two plots hold similar information. One displays the absolute number of murders, the other relative to sex.

Absolute Counts and Relative to Sex

According to Cook, the “fact that non-black victims of Chinese, Hispanic, Italian, Native American, and others of distinct ancestry are identified as “white” is especially problematic” (2011). In this dataset, 344 victims were reported to be white, another race, or unknown. Thus, the bars would probably shift even less from the white category.

By Victim

269 of these victims died with no name tied to their murder. Here are 15 of their records.

State Year Mo Day Victim County Race Sex Mob Offense
LA 1884 10 24 Unnamed Negro St. Tammany Blk Male NA Murder
MS 1904 3 13 Unnamed Negro Harrison Blk Male NA Murder
AL 1892 2 10 Unnamed Negro Tuscaloosa Blk Male NA Robbery & arson
AR 1892 6 29 Unnamed Negro Cross Blk Male Blk Criminal assault (rape)
MS 1923 6 10 Unnamed Negro Benton Blk Male NA Murder
MS 1923 9 31 Unnamed Negro Holmes Blk Fe NA Race prejudice
GA 1920 11 30 Unnamed Negro Thomas Blk Male NA Assault (rape)
MS 1906 1 18 Unnamed Negro Simpson Blk Male NA Attempted assault (rape)
MS 1912 5 5 Unnamed Negro Washington Blk Male NA Attempted assault (rape)
MS 1903 6 8 Unnamed Negro Smith Blk Fe NA Murder
NC 1910 10 8 Unnamed Negro Rockingham Blk Male NA Robbery
MS 1909 10 28 Unnamed Negro Kemper Blk Male NA Murder
FL 1911 5 21 Unnamed Negro Columbia Blk Unk NA Murder
AL 1893 9 5 Unnamed Negro Bibb Blk Male NA Rape
GA 1919 10 16 Unnamed Negro Marion Blk Unk NA Unknown

More than 2806 were murdered under offenses untried by law, most of them black men.

This list of murders I visualized are some layers to a multifaceted and tragically racist history in the United States. It is unknown to me how much of these data were biased due to systematic bias and how many more lynchings were not recorded due to the same reasons. In her paper, Cook urges further work to correct the data. Data tell stories, and this story is still incomplete.

References

I apologize for my poor referencing skills.

[1] Cook, Lisa D. “The Color of Lynching”. 2011.

[2] Project HAL.

[3] NAACP. https://www.naacp.org/history-of-lynchings/.