We have been using dplyr
and ggplot2
. We’ve been loading them in by using library(dplyr)
and library(ggplot2)
. For other packages that we’re going to be using for the first time, we have to use the function install.packages()
. For our lab this week, you’re asked to download the libraries tidyverse
and RCurl
.
Think of R
libraries like apps on your phone. You download your app once, and you can access it when you open it. For our case, install the packages using the following lines once, then call upon it using library()
Run these only once within the console (not your markdown!):
# * RUN THESE ONCE
install.packages("tidyverse")
install.packages("RCurl")
And run these always within your markdown:
# * RUN THESE ALWAYS
library(tidyverse)
library(RCurl)
Notice the usage of quotes. When you install a package, use quotes! When you load in a package, do not. And lastly, make sure they’re really " " " quotes. Some text editors will give you a fancier looking version of quotes, and R
does not work with stylized text. Work with plain text editors only.
Electronic Dance Music has become super popular in today’s music! Imagine that you wanted to explore some information about some of the most popular EDM DJ’s through data. Here’s your process:
R
into a dataframe called edm_df
We are going to load in dplyr
. We need to make sure that we’ve run install.packages("dplyr")
in the past or before we run library()
on it.
library(dplyr)
It has 17 rows and 11 columns. Each row corresponds to a DJ and each column corresponds to something interesting about them.
head(edm_df)
## day month year artist legal ethnicity
## 1 7 July 1991 Alesso Alessandro Lindblad Swedish
## 2 2 September 1989 Zedd Anton Zaslavski Russian-German
## 3 14 May 1996 Martin Garrix Martijn Gerard Garritsen Dutch
## 4 26 December 1990 Illenium Nicholas D. Miller American
## 5 8 September 1989 Avicii Tim Bergling Swedish
## 6 31 March 1987 Seven Lions Jeff Montalvo American
## instruments
## 1 Progressive house, electro house, pop
## 2 EDM, house, electro house, dubstep, complextro, progressive house, brostep
## 3 Progressive house, big room house, Dutch houseelectro house, future bass
## 4 Future Bass, Trap, Dubstep
## 5 Guitar, piano, keyboards, synthesizers
## 6 Drums, D.A.W. (FL Studio), turntables, synthesizer, Guitar
## genres
## 1 piano, keyboards, mixset, synthesizers
## 2 Keyboards, synthesizers, piano, guitar, drums, percussion, digital audio workstation
## 3 Digital audio workstation, guitar
## 4 Digital audio workstation, guitar, Ableton Live
## 5 EDM, progressive house, electro house
## 6 Melodic dubstep, electro house, progressive house, trance
## net_worth start active
## 1 30.00 2010 TRUE
## 2 35.00 2002 TRUE
## 3 22.00 2012 TRUE
## 4 0.45 2014 TRUE
## 5 85.00 2006 FALSE
## 6 5.00 2010 TRUE
If we take a closer look at the columns, we can see the class/type of R object the columns contain. This dataset is not perfect for analysis. Take a look at day
and year
. We would probably prefer those to be in numeric format.
I’m going to use the mutate_each_
function, so we can go ahead and make these two variables numeric using the function as.numeric()
. These are both useful functions, but they’re off the table for testing for now.
edm_df %>% mutate(day=as.numeric(as.character(day)),
year=as.numeric(as.character(year)))
## day month year artist legal
## 1 7 July 1991 Alesso Alessandro Lindblad
## 2 2 September 1989 Zedd Anton Zaslavski
## 3 14 May 1996 Martin Garrix Martijn Gerard Garritsen
## 4 26 December 1990 Illenium Nicholas D. Miller
## 5 8 September 1989 Avicii Tim Bergling
## 6 31 March 1987 Seven Lions Jeff Montalvo
## 7 17 January 1969 Tiesto Tijs Michiel Verwest
## 8 15 January 1988 Skrillex Sonny John Moore
## 9 7 November 1967 David Guetta Pierre David Guetta
## 10 18 December 1977 Axwell Axel Christofer Hedfors
## 11 17 January 1984 Calvin Harris Adam Richard Wiles
## 12 7 January 1988 Hardwell Robbert van de Corput
## 13 9 September 1987 Afrojack Nick van de Wall
## 14 19 May 1992 Marshmello Christopher Comstock
## 15 1 May 1997 Slushii Julian Scanlan
## 16 25 December 1976 Armin Van Buuren Armin Van Buuren
## 17 6 October 1988 KSHMR Niles Hollowell-Dhar
## ethnicity
## 1 Swedish
## 2 Russian-German
## 3 Dutch
## 4 American
## 5 Swedish
## 6 American
## 7 Dutch
## 8 American
## 9 French
## 10 Swedish
## 11 Scottish
## 12 Dutch
## 13 Dutch
## 14 American
## 15 American
## 16 Dutch
## 17 American
## instruments
## 1 Progressive house, electro house, pop
## 2 EDM, house, electro house, dubstep, complextro, progressive house, brostep
## 3 Progressive house, big room house, Dutch houseelectro house, future bass
## 4 Future Bass, Trap, Dubstep
## 5 Guitar, piano, keyboards, synthesizers
## 6 Drums, D.A.W. (FL Studio), turntables, synthesizer, Guitar
## 7 NA
## 8 Vocals, guitar, bass, CDJs, drum machine, synthesizer, sampler, sequencer, Ableton live
## 9 Piano, digital audio workstation
## 10 Music sequencer, synthesizers, drum machine
## 11 Vocals, piano, keyboards, synthesiser, guitar, bass guitar, sampler
## 12 Digital audio workstation, Keyboards, mixer, piano
## 13 NA
## 14 Digital audio workstation, synthesizer, guitar
## 15 Digital audio workstation, guitar, piano, drums, vocals
## 16 Synthesizer, drum machine
## 17 Synthesizer, digital audio workstation
## genres
## 1 piano, keyboards, mixset, synthesizers
## 2 Keyboards, synthesizers, piano, guitar, drums, percussion, digital audio workstation
## 3 Digital audio workstation, guitar
## 4 Digital audio workstation, guitar, Ableton Live
## 5 EDM, progressive house, electro house
## 6 Melodic dubstep, electro house, progressive house, trance
## 7 Progressive house, future house, electro house, big room house, trance, deep house
## 8 EDM, dubstep, electro house, trap, moombahton, post-hardcore
## 9 EDM, house, progressive house, electro house, dance-pop
## 10 Progressive house, electro house, funky house
## 11 EDM, electro house, electropop, Eurodance, dance-pop, nu-disco
## 12 Progressive house, big room house, electro house, Dutch house, hardstyle, future bass, trap, tech house
## 13 Dutch house, Minimal house, Electro house, trap, Future bass, Moombahton
## 14 Future bass, electronic, progressive house, trap
## 15 Dubstep, future bass, electro house, progressive housetrap
## 16 Uplifting trance, progressive trance, house, progressive house, electro house, psytrance
## 17 Electro house, big room house, psytrance
## net_worth start active
## 1 30.00 2010 TRUE
## 2 35.00 2002 TRUE
## 3 22.00 2012 TRUE
## 4 0.45 2014 TRUE
## 5 85.00 2006 FALSE
## 6 5.00 2010 TRUE
## 7 150.00 1994 TRUE
## 8 45.00 2004 TRUE
## 9 75.00 1984 TRUE
## 10 30.00 1995 TRUE
## 11 220.00 2002 TRUE
## 12 23.00 2005 TRUE
## 13 60.00 2003 TRUE
## 14 21.00 2013 TRUE
## 15 0.50 2016 TRUE
## 16 50.00 1996 TRUE
## 17 2.00 2006 TRUE
Now, we are going to overwrite our existing dataframe edm_df
with the one we just produced above. We do this because we want to be able to access the changes we made (changing factors into numbers) to the columns.
edm_df <- edm_df %>% mutate(day=as.numeric(as.character(day)),
year=as.numeric(as.character(year)))
Note: You can name your dataframes and variables however you like so long as you stay consistent. If you are going to create a new dataframe, have a reason why. For example, I may want to create a new dataframe because I don’t want to lose my copy of the original for future purposes. However, if I am confident in my work and not going to use the original version of the dataframe, then I can overwrite it just as I did in the cell block above.
Now that you’re a capable coder, you’re going to hear often about “binary” or “logical” or “boolean” data. What this means is that your data is just a whole bunch of TRUE
s and FALSE
s!
We have an example of this in our dataframe. Take a look at the column called active
.
# * ACTIVE COLUMN
edm_df %>% pull(active)
## [1] TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
## [12] TRUE TRUE TRUE TRUE TRUE TRUE
The R
language and also in mathematics, we can read these T/F values as a bunch of 0’s for FALSE
and 1’s for TRUE
.
# * CONVERTING ACTIVE COLUMN TO NUMERIC FOR UNDERSTANDING
as.numeric(edm_df %>% pull(active))
## [1] 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1
We can use the mean()
function to calculate the proportion of DJ’s within our dataset that are currently still actively working as DJ’s. Remember that the formula for mean is \(\frac{\sum_{i=1}^{n} x_i}{n}\). The numerator is the count of how many DJs are active and the denominator is the total in our dataset. We can use a dplyr
function to calculate this for us.
# * PROPORTION OF ACTIVE DJ's
edm_df %>% summarize(mean=mean(active))
## mean
## 1 0.9411765
94.12% of the DJ’s in our dataset are still active. The one DJ who isn’t active anymore in our dataset is Avicii who sadly passed away this year.
Making a logical vector or column in our dataframe goes like this. We need to look at our column and create a standard that values in a column may or may not satisfy. We will look at the instruments
column.
We are using the lengths()
function to calculate how many instruments a given DJ plays.
edm_df <- edm_df %>%
mutate(num_instruments=lengths(instruments))
edm_df
## day month year artist legal
## 1 7 July 1991 Alesso Alessandro Lindblad
## 2 2 September 1989 Zedd Anton Zaslavski
## 3 14 May 1996 Martin Garrix Martijn Gerard Garritsen
## 4 26 December 1990 Illenium Nicholas D. Miller
## 5 8 September 1989 Avicii Tim Bergling
## 6 31 March 1987 Seven Lions Jeff Montalvo
## 7 17 January 1969 Tiesto Tijs Michiel Verwest
## 8 15 January 1988 Skrillex Sonny John Moore
## 9 7 November 1967 David Guetta Pierre David Guetta
## 10 18 December 1977 Axwell Axel Christofer Hedfors
## 11 17 January 1984 Calvin Harris Adam Richard Wiles
## 12 7 January 1988 Hardwell Robbert van de Corput
## 13 9 September 1987 Afrojack Nick van de Wall
## 14 19 May 1992 Marshmello Christopher Comstock
## 15 1 May 1997 Slushii Julian Scanlan
## 16 25 December 1976 Armin Van Buuren Armin Van Buuren
## 17 6 October 1988 KSHMR Niles Hollowell-Dhar
## ethnicity
## 1 Swedish
## 2 Russian-German
## 3 Dutch
## 4 American
## 5 Swedish
## 6 American
## 7 Dutch
## 8 American
## 9 French
## 10 Swedish
## 11 Scottish
## 12 Dutch
## 13 Dutch
## 14 American
## 15 American
## 16 Dutch
## 17 American
## instruments
## 1 Progressive house, electro house, pop
## 2 EDM, house, electro house, dubstep, complextro, progressive house, brostep
## 3 Progressive house, big room house, Dutch houseelectro house, future bass
## 4 Future Bass, Trap, Dubstep
## 5 Guitar, piano, keyboards, synthesizers
## 6 Drums, D.A.W. (FL Studio), turntables, synthesizer, Guitar
## 7 NA
## 8 Vocals, guitar, bass, CDJs, drum machine, synthesizer, sampler, sequencer, Ableton live
## 9 Piano, digital audio workstation
## 10 Music sequencer, synthesizers, drum machine
## 11 Vocals, piano, keyboards, synthesiser, guitar, bass guitar, sampler
## 12 Digital audio workstation, Keyboards, mixer, piano
## 13 NA
## 14 Digital audio workstation, synthesizer, guitar
## 15 Digital audio workstation, guitar, piano, drums, vocals
## 16 Synthesizer, drum machine
## 17 Synthesizer, digital audio workstation
## genres
## 1 piano, keyboards, mixset, synthesizers
## 2 Keyboards, synthesizers, piano, guitar, drums, percussion, digital audio workstation
## 3 Digital audio workstation, guitar
## 4 Digital audio workstation, guitar, Ableton Live
## 5 EDM, progressive house, electro house
## 6 Melodic dubstep, electro house, progressive house, trance
## 7 Progressive house, future house, electro house, big room house, trance, deep house
## 8 EDM, dubstep, electro house, trap, moombahton, post-hardcore
## 9 EDM, house, progressive house, electro house, dance-pop
## 10 Progressive house, electro house, funky house
## 11 EDM, electro house, electropop, Eurodance, dance-pop, nu-disco
## 12 Progressive house, big room house, electro house, Dutch house, hardstyle, future bass, trap, tech house
## 13 Dutch house, Minimal house, Electro house, trap, Future bass, Moombahton
## 14 Future bass, electronic, progressive house, trap
## 15 Dubstep, future bass, electro house, progressive housetrap
## 16 Uplifting trance, progressive trance, house, progressive house, electro house, psytrance
## 17 Electro house, big room house, psytrance
## net_worth start active num_instruments
## 1 30.00 2010 TRUE 3
## 2 35.00 2002 TRUE 7
## 3 22.00 2012 TRUE 4
## 4 0.45 2014 TRUE 3
## 5 85.00 2006 FALSE 4
## 6 5.00 2010 TRUE 5
## 7 150.00 1994 TRUE 1
## 8 45.00 2004 TRUE 9
## 9 75.00 1984 TRUE 2
## 10 30.00 1995 TRUE 3
## 11 220.00 2002 TRUE 7
## 12 23.00 2005 TRUE 4
## 13 60.00 2003 TRUE 1
## 14 21.00 2013 TRUE 3
## 15 0.50 2016 TRUE 5
## 16 50.00 1996 TRUE 2
## 17 2.00 2006 TRUE 2
To make a Boolean column based on whether or not an EDM DJ plays more than 5 instruments, we’ll consider the condition num_instruments>5
in a mutate()
pipe. Let’s just say you have the opinion that if the DJ plays more than 5 instruments, then they’re insanely talented.
edm_df <- edm_df %>% mutate(insanely_talented=num_instruments>5)
edm_df %>% select(artist, num_instruments, insanely_talented)
## artist num_instruments insanely_talented
## 1 Alesso 3 FALSE
## 2 Zedd 7 TRUE
## 3 Martin Garrix 4 FALSE
## 4 Illenium 3 FALSE
## 5 Avicii 4 FALSE
## 6 Seven Lions 5 FALSE
## 7 Tiesto 1 FALSE
## 8 Skrillex 9 TRUE
## 9 David Guetta 2 FALSE
## 10 Axwell 3 FALSE
## 11 Calvin Harris 7 TRUE
## 12 Hardwell 4 FALSE
## 13 Afrojack 1 FALSE
## 14 Marshmello 3 FALSE
## 15 Slushii 5 FALSE
## 16 Armin Van Buuren 2 FALSE
## 17 KSHMR 2 FALSE
Now, no matter how interesting this dataset can be to explore, this is probably not a good sample, nor has the data been collected in the best way. We induced a lot of bias choosing DJ’s we saw straight off of Google and Spotify. We didn’t check Billboard or some other sort of music charts authority.
If we want to get a good idea of EDM DJ’s in general, we would need way more data. The more data we have, the better our estimates can become for whatever we wish to classify or predict.
Think about it! We could have done these.
From this class: - Plotting - Scatterplots - Histograms - Boxplots - Faceted scatterplots - Barplots of genres or instruments played - Regression - On age and net worth - On number of instruments and net worth - Two-way tables - To see how many DJ’s share different categorical properties (i.e. month
and active
)
And beyond: - Machine learning - Maps (Yes, like GIS with coordinates!)