Project HAL was brought to my attention in “The Color of Lynching” (2011) by Lisa D. Cook available here. In Cook’s paper, she discusses a particular data collection bias that struck chills in me. I immediately Google searched after reading Cook’s paper for one of the contemporary datasets she cited by Tolnay and Beck. The curiousity I had prior to downloading the HAL data was certainly sensitive to the subject matter, but not fully thought out. Not until opening the spredsheet did I realize what I was looking at. The rows were not about subjects and their cell phone usage or flowers and their petal lengths. These data were a murder list, each victim caught in a deep, complex, and painfully twisted history.
What Cook emphasizes is that whomever analyze data on lynchings must handle it with care. The HAL dataset is not just a public record of systematic, illegal murder, but also oversimplified as a result of a clear apathy for the people who were murdered.
The HAL dataset is a work by Stewart Tolnay and E.M. Beck, made available online by Project HAL. The data’s origins are from the National Association for the Advancement of Colored People (NAACP) Lynching Records at Tuskegee Institute. The dataset grows with contributions made from the public [2].
“Lynching” is defined by the standards of the NAACP [1]. The HAL dataset contains 2806 records of lynchings taken place in the South. According to the NAACP, between the years of 1882-1968, 4,743 lynchings took place in the United States with many left unrecorded [3]. Project HAL currently covers some of the deaths between the years of 1882 and 1930.
Below is a list of 10 of the 2806 people murdered. These 10 victims’ date and location of death, race, sex, what kind of mob targeted them, and under what offense they were murdered. Notably, race is binary – a subject is either “black” or “white” (all those who were non-black or mixed race) [1].
| State | Year | Mo | Day | Victim | County | Race | Sex | Mob | Offense | 
|---|---|---|---|---|---|---|---|---|---|
| LA | 1902 | 3 | 19 | John Woodward | Concordia | Blk | Male | NA | Murder | 
| LA | 1887 | 8 | 9 | Thomas Scott | Morehouse | Wht | Male | NA | Murder | 
| SC | 1886 | 5 | 5 | Wesley Williams | Kershaw | Blk | Male | NA | Attempted assault (rape) | 
| AR | 1896 | 4 | 18 | Jefferson Gardner | Bradley | Blk | Male | NA | Rape | 
| AR | 1899 | 3 | 22 | Joseph King | Little River | Blk | Male | NA | Threats against whites | 
| AR | 1896 | 7 | 30 | Godfrey Gould | Monroe | Wht | Male | NA | Rape | 
| LA | 1887 | 11 | 7 | Unnamed Negro | Caddo | Blk | Male | NA | Miscegenation | 
| MS | 1893 | 7 | 2 | Unnamed Negro | Harrison | Blk | Male | NA | Criminal assault (rape) | 
| LA | 1903 | 1 | 26 | John Thomas | St. Charles | Blk | Male | NA | Murder | 
| FL | 1887 | 12 | 12 | George Green | Marion | Blk | Male | NA | Theft | 
Lynching began in the late 19th century by whites in order to “protect white women” [3].
The top 10 offenses that “motivated” the lynchings are displayed below. The first offense is murder. Rape is a repeated offense that files under several labels of assault. Robbery and theft are also repeated offenses. As far as data collection goes here, I think we can do better.
To make a more informative bar plot, I can sort the offenses within the dataset by merging offenses of the same nature under the same count. The following code block shows how I filtered the data. I want to be clear that I am not trying to erase any original causes or content. This relabeling is for data uniformity for visualization purposes alone.
hal_data_filter <- hal_data
hal_data_filter$Offense <- ifelse(grepl("plot|attempt|att|attempted", tolower(hal_data_filter$Offense)), "Attempt", hal_data_filter$Offense)
hal_data_filter$Offense <- ifelse(grepl("girl's bedroom|girl's room|woman's room|woman's bedroom|chamber|lady's room", tolower(hal_data_filter$Offense)), "Girl's room", hal_data_filter$Offense)
hal_data_filter$Offense <- ifelse(grepl("white woman|white girl|women|woman", tolower(hal_data_filter$Offense)), "White woman", hal_data_filter$Offense)
hal_data_filter$Offense <- ifelse(grepl("rape", tolower(hal_data_filter$Offense)), "Rape", hal_data_filter$Offense)
hal_data_filter$Offense <- ifelse(grepl("threat", tolower(hal_data_filter$Offense)), "Threat", hal_data_filter$Offense)
hal_data_filter$Offense <- ifelse(grepl("implicated|complicity|accomplice|aided murderer|father of|brother of", tolower(hal_data_filter$Offense)), "Implicated", hal_data_filter$Offense)
hal_data_filter$Offense <- ifelse(grepl("arson|incendiarism|dynamiting|barn|incendiary", tolower(hal_data_filter$Offense)), "Arson", hal_data_filter$Offense)
hal_data_filter$Offense <- ifelse(grepl("assault", tolower(hal_data_filter$Offense)), "Assault", hal_data_filter$Offense)
hal_data_filter$Offense <- ifelse(grepl("robbery|burglary|theft|stealing", tolower(hal_data_filter$Offense)), "Robbery", hal_data_filter$Offense)
hal_data_filter$Offense <- ifelse(grepl("bad character", tolower(hal_data_filter$Offense)), "Bad character", hal_data_filter$Offense)
hal_data_filter$Offense <- ifelse(grepl("political", tolower(hal_data_filter$Offense)), "Political", hal_data_filter$Offense)
hal_data_filter$Offense <- ifelse(grepl("murder", tolower(hal_data_filter$Offense)), "Murder", hal_data_filter$Offense)
hal_data_filter$Offense <- ifelse(grepl("insult", tolower(hal_data_filter$Offense)), "Insult", hal_data_filter$Offense)
hal_data_filter$Offense <- ifelse(grepl("race", tolower(hal_data_filter$Offense)), "Race", hal_data_filter$Offense)
hal_data_filter$Offense <- ifelse(grepl("shot|shooting|shoot", tolower(hal_data_filter$Offense)), "Shooting", hal_data_filter$Offense)
hal_data_filter$Offense <- ifelse(grepl("white man|white men", tolower(hal_data_filter$Offense)), "White man", hal_data_filter$Offense)Here are the top 10 offenses “white victims” (all non-blacks or mixed race victims) were lynched for.
| Offense | Frequency | 
|---|---|
| Murder | 144 | 
| Rape | 26 | 
| Robbery | 18 | 
| Attempt | 11 | 
| Outlaw | 11 | 
| Unknown | 11 | 
| Arson | 8 | 
| Implicated | 7 | 
| Assault | 5 | 
| Shooting | 5 | 
Here are the top 10 offenses among black victims. I will also visualize the offenses for black victims below.
| Offense | Frequency | 
|---|---|
| Murder | 764 | 
| Rape | 418 | 
| Attempt | 383 | 
| Assault | 162 | 
| Arson | 113 | 
| Robbery | 102 | 
| White woman | 88 | 
| Unknown | 75 | 
| Implicated | 67 | 
| Race | 49 | 
Above, murder is the highest cited offense for all of the lynchings, and rape is second. The third category is called “Attempt” in which I counted all the victims that were murdered for “attempting to commit a crime”. I also have a category that has to do with “Implications” that the victim aided in criminal activity. Again, none of these murders were tried in court or “proven” to be fact (I say this with caution because I am not sure how much a trial would have helped these victims). There are many more questionable offenses throughout this database. Below are 10 more randomly picked offenses that black people were targeted for lynchings.
| Offense | Frequency | |
|---|---|---|
| 87 | Obscene language | 1 | 
| 73 | Inciting to riot | 1 | 
| 61 | Dangerous character | 1 | 
| 61.1 | Dangerous character | 1 | 
| 109 | Wounded deputy | 1 | 
| 87.1 | Obscene language | 1 | 
| 67 | Flirting with wh. girl | 1 | 
| 75 | Indecent proposals to girls | 1 | 
| 18 | Miscegenation | 4 | 
| 53 | Bad reputation | 1 | 
Miscegenation (interracial relations) was a repeated offense. In 1897, Andy Beard was lynched for eloping with a white woman. In 1888 and 1905, Jim Torney and Joe Woodman were also murdered under the same offense. It would not surprise me if this were the true motivation to more lynchings. Furthermore in this dataset, 13 were murdered under a mistaken identity, and 88 offenses were unknown.
Although lynching occurred in other states, most of the nation’s lynching happened in the South [3]. The HAL dataset covers 9 southern states. Using the dplyr package, I was able to aggregate data based on year and state of lynching. Here are the data for 1882.
| Year | State | n | 
|---|---|---|
| 1882 | AL | 6 | 
| 1882 | AR | 4 | 
| 1882 | FL | 3 | 
| 1882 | GA | 3 | 
| 1882 | KY | 8 | 
| 1882 | LA | 9 | 
| 1882 | MS | 3 | 
| 1882 | SC | 6 | 
| 1882 | TN | 2 | 
The following barplot shows the number of records for all of the years per state.
Over time for each state, we see that 1890 brought about more lynchings than other time periods, and some states curbed their lynching activity in later years. North Carolina overall had the fewest lynchings among the 9 states.
In this dataset, 84.25% of lynchings were done to black males. Below is a table that shows the absolute values of lynchings (column n) and their relative frequency (column freq) to which sex they were classified under.
| Race | Sex | n | freq | 
|---|---|---|---|
| Blk | Fe | 74 | 0.9367089 | 
| Blk | Male | 2364 | 0.8749075 | 
| Blk | Unk | 24 | 0.9600000 | 
| Other | Male | 5 | 0.0018505 | 
| Unk | Male | 49 | 0.0181347 | 
| Unk | Unk | 1 | 0.0400000 | 
| Wht | Fe | 5 | 0.0632911 | 
| Wht | Male | 284 | 0.1051073 | 
Here, we see the obvious skew of lynchings toward black people.
The following two plots hold similar information. One displays the absolute number of murders, the other relative to sex.
According to Cook, the “fact that non-black victims of Chinese, Hispanic, Italian, Native American, and others of distinct ancestry are identified as “white” is especially problematic” (2011). In this dataset, 344 victims were reported to be white, another race, or unknown. Thus, the bars would probably shift even less from the white category.
269 of these victims died with no name tied to their murder. Here are 15 of their records.
| State | Year | Mo | Day | Victim | County | Race | Sex | Mob | Offense | 
|---|---|---|---|---|---|---|---|---|---|
| LA | 1884 | 10 | 24 | Unnamed Negro | St. Tammany | Blk | Male | NA | Murder | 
| MS | 1904 | 3 | 13 | Unnamed Negro | Harrison | Blk | Male | NA | Murder | 
| AL | 1892 | 2 | 10 | Unnamed Negro | Tuscaloosa | Blk | Male | NA | Robbery & arson | 
| AR | 1892 | 6 | 29 | Unnamed Negro | Cross | Blk | Male | Blk | Criminal assault (rape) | 
| MS | 1923 | 6 | 10 | Unnamed Negro | Benton | Blk | Male | NA | Murder | 
| MS | 1923 | 9 | 31 | Unnamed Negro | Holmes | Blk | Fe | NA | Race prejudice | 
| GA | 1920 | 11 | 30 | Unnamed Negro | Thomas | Blk | Male | NA | Assault (rape) | 
| MS | 1906 | 1 | 18 | Unnamed Negro | Simpson | Blk | Male | NA | Attempted assault (rape) | 
| MS | 1912 | 5 | 5 | Unnamed Negro | Washington | Blk | Male | NA | Attempted assault (rape) | 
| MS | 1903 | 6 | 8 | Unnamed Negro | Smith | Blk | Fe | NA | Murder | 
| NC | 1910 | 10 | 8 | Unnamed Negro | Rockingham | Blk | Male | NA | Robbery | 
| MS | 1909 | 10 | 28 | Unnamed Negro | Kemper | Blk | Male | NA | Murder | 
| FL | 1911 | 5 | 21 | Unnamed Negro | Columbia | Blk | Unk | NA | Murder | 
| AL | 1893 | 9 | 5 | Unnamed Negro | Bibb | Blk | Male | NA | Rape | 
| GA | 1919 | 10 | 16 | Unnamed Negro | Marion | Blk | Unk | NA | Unknown | 
More than 2806 were murdered under offenses untried by law, most of them black men.
This list of murders I visualized are some layers to a multifaceted and tragically racist history in the United States. It is unknown to me how much of these data were biased due to systematic bias and how many more lynchings were not recorded due to the same reasons. In her paper, Cook urges further work to correct the data. Data tell stories, and this story is still incomplete.
I apologize for my poor referencing skills.
[1] Cook, Lisa D. “The Color of Lynching”. 2011.
[2] Project HAL.
[3] NAACP. https://www.naacp.org/history-of-lynchings/.