Opening Remarks

When you’re learning how to code… Don’t take notes. I know some people who handwrite their code, but that’s because they already know how to code. Try your best to listen. To learn how to code, you ought to watch coding being done and practice coding for yourself.

Elements of RMarkdown File

This is regular text.

Here’s a code block.

# CODE BLOCK
"This is a string, but do not try and write your code in quotes like this!"
## [1] "This is a string, but do not try and write your code in quotes like this!"

Assignment Operators

hi = 4
hi <- 4 # MORE "R-LIKE"
hi<-4
hi=4

Logical Values

a <- 1
b <- 2

# TRUE OR FALSE?
a < b
## [1] TRUE
a > b
## [1] FALSE
a <= b
## [1] TRUE
a >= b
## [1] FALSE
a == b
## [1] FALSE

Libraries

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)

How to Run Your Chunks

You can run using the green arrow, the run dropdown above, or control/command+enter.

CS_data <- read.csv("data/Cesarean.csv")

Your Environment

You need to check whether an object is in your environment or not before you call upon it. It should be instinctual for you to load in your data (i.e. tell your computer what you want to act upon) before you start telling R to use it.

Dataframes

A dataframe in R is how you would expect a “table” to be. (Be careful, don’t call them tables!)

class(CS_data)
## [1] "data.frame"
#View(CS_data)

To easily manage dataframes in R, we can use a library called dplyr. A library in R is what you would expect of an app on your phone. Your phone has a the capability of being extremely useful in many cases. A phone app unlocks its ability to manage your weight loss or track your menstrual period, making your life easier. Likewise, a library in R was written by a developer for the community to use and make computing in R easier.

# EXAMPLE FROM LAB
head(CS_data %>% mutate(CS_rate_100 = CS_rate * 100))
##           Country_Name CountryCode Births_Per_1000         Income_Group
## 1              Albania         ALB              46  Upper middle income
## 2              Andorra         AND               1 High income: nonOECD
## 3 United Arab Emirates         ARE              63 High income: nonOECD
## 4            Argentina         ARG             689 High income: nonOECD
## 5              Armenia         ARM              47  Lower middle income
## 6            Australia         AUS             267    High income: OECD
##                       Region  GDP_2006 CS_rate CS_rate_100
## 1      Europe & Central Asia  3051.768   0.256        25.6
## 2      Europe & Central Asia 42417.229   0.237        23.7
## 3 Middle East & North Africa 42950.101   0.100        10.0
## 4  Latin America & Caribbean  6649.414   0.352        35.2
## 5      Europe & Central Asia  2126.619   0.141        14.1
## 6        East Asia & Pacific 36100.559   0.303        30.3
CS_data <- CS_data %>% mutate(CS_rate_100 = CS_rate * 100)

# ANOTHER HELPFUL(?) EXAMPLE
CS_data_new <- CS_data %>% mutate(CS_logical_check = CS_rate < median(CS_rate))
CS_data <- CS_data %>% mutate(CS_logical_check = CS_rate < median(CS_rate))

# ANOTHA ONE
CS_data <- CS_data %>% rename(CS_rate_below_median = CS_logical_check)

Data Visualization

We can totally plot stuff in base R. Check this out.

# BASE R PLOT
plot(CS_data$Births_Per_1000, CS_data$CS_rate)

# TRYING TO MAKE IT PRETTY
plot(CS_data$Births_Per_1000, CS_data$CS_rate, xlab="Births Per 1000", ylab="Cesarian Section Rate", main="Cesarian Rates vs. Births Per 1000", pch=19, col="blue")

You can sure visualize data like this. Up to you. We’re opting for ggplot2 in this class though. We’re going to start with …

Help

?ggplot

Remaking the Scatterplot

We’re going to make the same scatterplot above… Recall that you’re plotting two different numeric variables.

this_plot = ggplot(data=CS_data, aes(x=Births_Per_1000, y=CS_rate))
this_plot

All you need …

this_plot + geom_point()

But let’s add some fun…

this_plot = ggplot(CS_data, aes(x=Births_Per_1000, y=CS_rate)) +
  geom_point(col=alpha("pink2", 0.6)) +
  ggtitle("Here's my title") +
  xlab("X-Lab") +
  ylab("Y-lab")
this_plot

How About A Barplot?

Let’s start basic.

ggplot(CS_data, aes(x=Income_Group)) + geom_bar()

But what do you think is the reason we visualize data?

ggplot(CS_data, aes(x=Income_Group)) + geom_bar(aes(fill=Region)) + xlab("Income Group") + ggtitle("Income Groups per Country") + theme(axis.text.x = element_text(angle = 20, hjust = 1))

ggplot(CS_data, aes(x=Region)) + geom_bar(aes(fill=Income_Group)) + xlab("Income Group") + ggtitle("Income Groups per Country") + theme(axis.text.x = element_text(angle = 20, hjust = 1))

You don’t want to confuse your audience. You want to educate. You want to tell the best story you can. Use color. Label your axes. Think wisely about how to organize your data. It all makes a difference.

Recap

You are learning dplyr and ggplot2! One is for manipulating data. The other is for data viz.