Note: Packages in R

We have been using dplyr and ggplot2. We’ve been loading them in by using library(dplyr) and library(ggplot2). For other packages that we’re going to be using for the first time, we have to use the function install.packages(). For our lab this week, you’re asked to download the libraries tidyverse and RCurl.

Think of R libraries like apps on your phone. You download your app once, and you can access it when you open it. For our case, install the packages using the following lines once, then call upon it using library()

Run these only once within the console (not your markdown!):

# * RUN THESE ONCE
install.packages("tidyverse")
install.packages("RCurl")

And run these always within your markdown:

# * RUN THESE ALWAYS
library(tidyverse)
library(RCurl)

Notice the usage of quotes. When you install a package, use quotes! When you load in a package, do not. And lastly, make sure they’re really " " " quotes. Some text editors will give you a fancier looking version of quotes, and R does not work with stylized text. Work with plain text editors only.

A little EDM in R

Electronic Dance Music has become super popular in today’s music! Imagine that you wanted to explore some information about some of the most popular EDM DJ’s through data. Here’s your process:

  1. You (a) go onto Google to search up “Top EDM DJ’s”, (b) look at Spotify and see recommendations, and (c) then remember that information
  2. You check the Wikipedia page for each of the DJ’s and keep variables of interest
  3. You do a separate Google search for estimates of their net worth
  4. You store all of your findings in R into a dataframe called edm_df

Loading in Libraries

We are going to load in dplyr. We need to make sure that we’ve run install.packages("dplyr") in the past or before we run library() on it.

library(dplyr)

How does the data look like?

It has 17 rows and 11 columns. Each row corresponds to a DJ and each column corresponds to something interesting about them.

head(edm_df)
##   day     month year        artist                    legal      ethnicity
## 1   7      July 1991        Alesso      Alessandro Lindblad        Swedish
## 2   2 September 1989          Zedd          Anton Zaslavski Russian-German
## 3  14       May 1996 Martin Garrix Martijn Gerard Garritsen          Dutch
## 4  26  December 1990      Illenium       Nicholas D. Miller       American
## 5   8 September 1989        Avicii             Tim Bergling        Swedish
## 6  31     March 1987   Seven Lions            Jeff Montalvo       American
##                                                                        instruments
## 1                                            Progressive house, electro house, pop
## 2 EDM,  house,  electro house,  dubstep,  complextro,  progressive house,  brostep
## 3         Progressive house, big room house, Dutch houseelectro house, future bass
## 4                                                       Future Bass, Trap, Dubstep
## 5                                        Guitar,  piano,  keyboards,  synthesizers
## 6                   Drums,  D.A.W. (FL Studio),  turntables,  synthesizer,  Guitar
##                                                                                       genres
## 1                                                     piano, keyboards, mixset, synthesizers
## 2 Keyboards,  synthesizers,  piano,  guitar,  drums,  percussion,  digital audio workstation
## 3                                                          Digital audio workstation, guitar
## 4                                            Digital audio workstation, guitar, Ableton Live
## 5                                                    EDM,  progressive house,  electro house
## 6                               Melodic dubstep,  electro house,  progressive house,  trance
##   net_worth start active
## 1     30.00  2010   TRUE
## 2     35.00  2002   TRUE
## 3     22.00  2012   TRUE
## 4      0.45  2014   TRUE
## 5     85.00  2006  FALSE
## 6      5.00  2010   TRUE

Data Types

If we take a closer look at the columns, we can see the class/type of R object the columns contain. This dataset is not perfect for analysis. Take a look at day and year. We would probably prefer those to be in numeric format.

I’m going to use the mutate_each_ function, so we can go ahead and make these two variables numeric using the function as.numeric(). These are both useful functions, but they’re off the table for testing for now.

edm_df %>% mutate(day=as.numeric(as.character(day)),
                  year=as.numeric(as.character(year)))
##    day     month year           artist                    legal
## 1    7      July 1991           Alesso      Alessandro Lindblad
## 2    2 September 1989             Zedd          Anton Zaslavski
## 3   14       May 1996    Martin Garrix Martijn Gerard Garritsen
## 4   26  December 1990         Illenium       Nicholas D. Miller
## 5    8 September 1989           Avicii             Tim Bergling
## 6   31     March 1987      Seven Lions            Jeff Montalvo
## 7   17   January 1969           Tiesto     Tijs Michiel Verwest
## 8   15   January 1988         Skrillex         Sonny John Moore
## 9    7  November 1967     David Guetta      Pierre David Guetta
## 10  18  December 1977           Axwell  Axel Christofer Hedfors
## 11  17   January 1984    Calvin Harris       Adam Richard Wiles
## 12   7   January 1988         Hardwell    Robbert van de Corput
## 13   9 September 1987         Afrojack         Nick van de Wall
## 14  19       May 1992       Marshmello     Christopher Comstock
## 15   1       May 1997          Slushii           Julian Scanlan
## 16  25  December 1976 Armin Van Buuren         Armin Van Buuren
## 17   6   October 1988            KSHMR     Niles Hollowell-Dhar
##         ethnicity
## 1         Swedish
## 2  Russian-German
## 3           Dutch
## 4        American
## 5         Swedish
## 6        American
## 7           Dutch
## 8        American
## 9          French
## 10        Swedish
## 11       Scottish
## 12          Dutch
## 13          Dutch
## 14       American
## 15       American
## 16          Dutch
## 17       American
##                                                                                        instruments
## 1                                                            Progressive house, electro house, pop
## 2                 EDM,  house,  electro house,  dubstep,  complextro,  progressive house,  brostep
## 3                         Progressive house, big room house, Dutch houseelectro house, future bass
## 4                                                                       Future Bass, Trap, Dubstep
## 5                                                        Guitar,  piano,  keyboards,  synthesizers
## 6                                   Drums,  D.A.W. (FL Studio),  turntables,  synthesizer,  Guitar
## 7                                                                                               NA
## 8  Vocals,  guitar,  bass,  CDJs,  drum machine,  synthesizer,  sampler,  sequencer,  Ableton live
## 9                                                                 Piano, digital audio workstation
## 10                                                     Music sequencer, synthesizers, drum machine
## 11                             Vocals, piano, keyboards, synthesiser, guitar, bass guitar, sampler
## 12                                              Digital audio workstation, Keyboards, mixer, piano
## 13                                                                                              NA
## 14                                                  Digital audio workstation, synthesizer, guitar
## 15                                         Digital audio workstation, guitar, piano, drums, vocals
## 16                                                                       Synthesizer, drum machine
## 17                                                          Synthesizer, digital audio workstation
##                                                                                                     genres
## 1                                                                   piano, keyboards, mixset, synthesizers
## 2               Keyboards,  synthesizers,  piano,  guitar,  drums,  percussion,  digital audio workstation
## 3                                                                        Digital audio workstation, guitar
## 4                                                          Digital audio workstation, guitar, Ableton Live
## 5                                                                  EDM,  progressive house,  electro house
## 6                                             Melodic dubstep,  electro house,  progressive house,  trance
## 7                       Progressive house, future house, electro house, big room house, trance, deep house
## 8                                        EDM,  dubstep,  electro house,  trap,  moombahton,  post-hardcore
## 9                                                  EDM, house, progressive house, electro house, dance-pop
## 10                                                           Progressive house, electro house, funky house
## 11                                          EDM, electro house, electropop, Eurodance, dance-pop, nu-disco
## 12 Progressive house, big room house, electro house, Dutch house, hardstyle, future bass, trap, tech house
## 13                           Dutch house,  Minimal house,  Electro house,  trap,  Future bass,  Moombahton
## 14                                                        Future bass, electronic, progressive house, trap
## 15                                              Dubstep, future bass, electro house, progressive housetrap
## 16                Uplifting trance, progressive trance, house, progressive house, electro house, psytrance
## 17                                                                Electro house, big room house, psytrance
##    net_worth start active
## 1      30.00  2010   TRUE
## 2      35.00  2002   TRUE
## 3      22.00  2012   TRUE
## 4       0.45  2014   TRUE
## 5      85.00  2006  FALSE
## 6       5.00  2010   TRUE
## 7     150.00  1994   TRUE
## 8      45.00  2004   TRUE
## 9      75.00  1984   TRUE
## 10     30.00  1995   TRUE
## 11    220.00  2002   TRUE
## 12     23.00  2005   TRUE
## 13     60.00  2003   TRUE
## 14     21.00  2013   TRUE
## 15      0.50  2016   TRUE
## 16     50.00  1996   TRUE
## 17      2.00  2006   TRUE

Now, we are going to overwrite our existing dataframe edm_df with the one we just produced above. We do this because we want to be able to access the changes we made (changing factors into numbers) to the columns.

edm_df <- edm_df %>% mutate(day=as.numeric(as.character(day)),
                  year=as.numeric(as.character(year)))

Note: You can name your dataframes and variables however you like so long as you stay consistent. If you are going to create a new dataframe, have a reason why. For example, I may want to create a new dataframe because I don’t want to lose my copy of the original for future purposes. However, if I am confident in my work and not going to use the original version of the dataframe, then I can overwrite it just as I did in the cell block above.

Logical Operators

Now that you’re a capable coder, you’re going to hear often about “binary” or “logical” or “boolean” data. What this means is that your data is just a whole bunch of TRUEs and FALSEs!

An Example from Our Dataframe

We have an example of this in our dataframe. Take a look at the column called active.

# * ACTIVE COLUMN
edm_df %>% pull(active)
##  [1]  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [12]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

The R language and also in mathematics, we can read these T/F values as a bunch of 0’s for FALSE and 1’s for TRUE.

# * CONVERTING ACTIVE COLUMN TO NUMERIC FOR UNDERSTANDING
as.numeric(edm_df %>% pull(active))
##  [1] 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1

We can use the mean() function to calculate the proportion of DJ’s within our dataset that are currently still actively working as DJ’s. Remember that the formula for mean is \(\frac{\sum_{i=1}^{n} x_i}{n}\). The numerator is the count of how many DJs are active and the denominator is the total in our dataset. We can use a dplyr function to calculate this for us.

# * PROPORTION OF ACTIVE DJ's
edm_df %>% summarize(mean=mean(active))
##        mean
## 1 0.9411765

94.12% of the DJ’s in our dataset are still active. The one DJ who isn’t active anymore in our dataset is Avicii who sadly passed away this year.

Making a Boolean/Logical/True-False Vector

Making a logical vector or column in our dataframe goes like this. We need to look at our column and create a standard that values in a column may or may not satisfy. We will look at the instruments column.

We are using the lengths() function to calculate how many instruments a given DJ plays.

edm_df <- edm_df %>%
  mutate(num_instruments=lengths(instruments))
edm_df
##    day     month year           artist                    legal
## 1    7      July 1991           Alesso      Alessandro Lindblad
## 2    2 September 1989             Zedd          Anton Zaslavski
## 3   14       May 1996    Martin Garrix Martijn Gerard Garritsen
## 4   26  December 1990         Illenium       Nicholas D. Miller
## 5    8 September 1989           Avicii             Tim Bergling
## 6   31     March 1987      Seven Lions            Jeff Montalvo
## 7   17   January 1969           Tiesto     Tijs Michiel Verwest
## 8   15   January 1988         Skrillex         Sonny John Moore
## 9    7  November 1967     David Guetta      Pierre David Guetta
## 10  18  December 1977           Axwell  Axel Christofer Hedfors
## 11  17   January 1984    Calvin Harris       Adam Richard Wiles
## 12   7   January 1988         Hardwell    Robbert van de Corput
## 13   9 September 1987         Afrojack         Nick van de Wall
## 14  19       May 1992       Marshmello     Christopher Comstock
## 15   1       May 1997          Slushii           Julian Scanlan
## 16  25  December 1976 Armin Van Buuren         Armin Van Buuren
## 17   6   October 1988            KSHMR     Niles Hollowell-Dhar
##         ethnicity
## 1         Swedish
## 2  Russian-German
## 3           Dutch
## 4        American
## 5         Swedish
## 6        American
## 7           Dutch
## 8        American
## 9          French
## 10        Swedish
## 11       Scottish
## 12          Dutch
## 13          Dutch
## 14       American
## 15       American
## 16          Dutch
## 17       American
##                                                                                        instruments
## 1                                                            Progressive house, electro house, pop
## 2                 EDM,  house,  electro house,  dubstep,  complextro,  progressive house,  brostep
## 3                         Progressive house, big room house, Dutch houseelectro house, future bass
## 4                                                                       Future Bass, Trap, Dubstep
## 5                                                        Guitar,  piano,  keyboards,  synthesizers
## 6                                   Drums,  D.A.W. (FL Studio),  turntables,  synthesizer,  Guitar
## 7                                                                                               NA
## 8  Vocals,  guitar,  bass,  CDJs,  drum machine,  synthesizer,  sampler,  sequencer,  Ableton live
## 9                                                                 Piano, digital audio workstation
## 10                                                     Music sequencer, synthesizers, drum machine
## 11                             Vocals, piano, keyboards, synthesiser, guitar, bass guitar, sampler
## 12                                              Digital audio workstation, Keyboards, mixer, piano
## 13                                                                                              NA
## 14                                                  Digital audio workstation, synthesizer, guitar
## 15                                         Digital audio workstation, guitar, piano, drums, vocals
## 16                                                                       Synthesizer, drum machine
## 17                                                          Synthesizer, digital audio workstation
##                                                                                                     genres
## 1                                                                   piano, keyboards, mixset, synthesizers
## 2               Keyboards,  synthesizers,  piano,  guitar,  drums,  percussion,  digital audio workstation
## 3                                                                        Digital audio workstation, guitar
## 4                                                          Digital audio workstation, guitar, Ableton Live
## 5                                                                  EDM,  progressive house,  electro house
## 6                                             Melodic dubstep,  electro house,  progressive house,  trance
## 7                       Progressive house, future house, electro house, big room house, trance, deep house
## 8                                        EDM,  dubstep,  electro house,  trap,  moombahton,  post-hardcore
## 9                                                  EDM, house, progressive house, electro house, dance-pop
## 10                                                           Progressive house, electro house, funky house
## 11                                          EDM, electro house, electropop, Eurodance, dance-pop, nu-disco
## 12 Progressive house, big room house, electro house, Dutch house, hardstyle, future bass, trap, tech house
## 13                           Dutch house,  Minimal house,  Electro house,  trap,  Future bass,  Moombahton
## 14                                                        Future bass, electronic, progressive house, trap
## 15                                              Dubstep, future bass, electro house, progressive housetrap
## 16                Uplifting trance, progressive trance, house, progressive house, electro house, psytrance
## 17                                                                Electro house, big room house, psytrance
##    net_worth start active num_instruments
## 1      30.00  2010   TRUE               3
## 2      35.00  2002   TRUE               7
## 3      22.00  2012   TRUE               4
## 4       0.45  2014   TRUE               3
## 5      85.00  2006  FALSE               4
## 6       5.00  2010   TRUE               5
## 7     150.00  1994   TRUE               1
## 8      45.00  2004   TRUE               9
## 9      75.00  1984   TRUE               2
## 10     30.00  1995   TRUE               3
## 11    220.00  2002   TRUE               7
## 12     23.00  2005   TRUE               4
## 13     60.00  2003   TRUE               1
## 14     21.00  2013   TRUE               3
## 15      0.50  2016   TRUE               5
## 16     50.00  1996   TRUE               2
## 17      2.00  2006   TRUE               2

To make a Boolean column based on whether or not an EDM DJ plays more than 5 instruments, we’ll consider the condition num_instruments>5 in a mutate() pipe. Let’s just say you have the opinion that if the DJ plays more than 5 instruments, then they’re insanely talented.

edm_df <- edm_df %>% mutate(insanely_talented=num_instruments>5)
edm_df %>% select(artist, num_instruments, insanely_talented)
##              artist num_instruments insanely_talented
## 1            Alesso               3             FALSE
## 2              Zedd               7              TRUE
## 3     Martin Garrix               4             FALSE
## 4          Illenium               3             FALSE
## 5            Avicii               4             FALSE
## 6       Seven Lions               5             FALSE
## 7            Tiesto               1             FALSE
## 8          Skrillex               9              TRUE
## 9      David Guetta               2             FALSE
## 10           Axwell               3             FALSE
## 11    Calvin Harris               7              TRUE
## 12         Hardwell               4             FALSE
## 13         Afrojack               1             FALSE
## 14       Marshmello               3             FALSE
## 15          Slushii               5             FALSE
## 16 Armin Van Buuren               2             FALSE
## 17            KSHMR               2             FALSE

Is this a good sample?

Now, no matter how interesting this dataset can be to explore, this is probably not a good sample, nor has the data been collected in the best way. We induced a lot of bias choosing DJ’s we saw straight off of Google and Spotify. We didn’t check Billboard or some other sort of music charts authority.

If we want to get a good idea of EDM DJ’s in general, we would need way more data. The more data we have, the better our estimates can become for whatever we wish to classify or predict.

There’s so much more we can do!

Think about it! We could have done these.

From this class: - Plotting - Scatterplots - Histograms - Boxplots - Faceted scatterplots - Barplots of genres or instruments played - Regression - On age and net worth - On number of instruments and net worth - Two-way tables - To see how many DJ’s share different categorical properties (i.e. month and active)

And beyond: - Machine learning - Maps (Yes, like GIS with coordinates!)