22. San Francisco, CA.
I've only been programming for a year, but it already feels like a lifetime. Palautatan means "statistics" in Tagalog.

View Home Page | View My GitHub Profile
San Francisco Data (SQL and Mapping)

San Francisco Data

Sources

I referred to the following websites.

  1. http://basemaptutorial.readthedocs.io/en/latest/shapefile.html
  2. http://matplotlib.org/users/colormaps.html
  3. http://brandonrose.org/pythonmap
  4. https://geohackweek.github.io/vector/04-geopandas-intro/

Libraries

I used the following libraries.

In [2]:
# SQL
import sqlite3 as sql

# YOUR FAVORITE DATA SCIENCE LIBRARIES
import pandas as pd
import numpy as np
In [3]:
# DISPLAY PLOTS INLINE
%matplotlib inline

# MAKE 'EM PLOTS BIG
import seaborn as sns
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (12, 12)

# MAPS
from mpl_toolkits.basemap import Basemap
import geopandas as gpd
import shapely.geometry as geom

Exercise 1.1

In the following exercise, I will be extracting information about the food trucks from the SQL database.

In [4]:
# LOAD IN DATA BASE
db = sql.connect("sf_data.sqlite")
In [5]:
# QUERY TOP 10
top10 = pd.read_sql("SELECT Applicant, count(Applicant) AS num_permits FROM mobile_food_permits AS mfp "
                    "INNER JOIN (SELECT DISTINCT locationid, permit FROM mobile_food_schedule) "
                    "AS mfs ON mfp.permit=mfs.permit WHERE mfp.Status=='APPROVED' GROUP BY mfp.Applicant "
                    "ORDER By num_permits DESC LIMIT 10", db)
top10
Out[5]:
Applicant num_permits
0 Park's Catering 23
1 D & T Catering 17
2 F & C Catering 13
3 Munch A Bunch 10
4 Eva's Catering 8
5 BH & MT LLC 7
6 Linda's Catering 7
7 Golden Catering 5
8 San Francisco Carts & Concessions, Inc. DBA St... 5
9 Bay Area Mobile Catering, Inc. dba. Taqueria A... 4

Park's Catering sells in the most areas in the City, holding 23 permits for 23 locations. D&T Catering is second with 17 locations, and F&C Catering is third with 13 locations. In the printed output above, we see the rest of the top 10.

In [58]:
# INITIALIZE GRAPH OBJECTS
objects = list(top10.Applicant.values)
objects_abb = [name[:10] for name in objects]
y_pos = np.arange(len(objects))

# PLOTTING DATA
plt.bar(y_pos, top10["num_permits"][:15], color="orange")

# #Set up the graphs
plt.xticks(y_pos, objects_abb, rotation = "horizontal")
plt.ylabel("Number of Unique Locations")
plt.title("Number of Locations of Food Trucks in the City")
plt.show()

Exercise 1.2

As I progress through Exercise 1.2, I will be analyzing some topics that I find most relevant, having been born and raised in the City.

(1) What are the most and least expensive parts of the City?
(2) What are the most common types of crime? Where do crimes happen most?
(3) Where do homicides happen in the City?
(4) What is the relationship between elementary and high schools in the City?
(5) What is the distribution of high schools in the City?

(1) What are the most and least expensive parts of the City?

I will first select the median prices per square foot of homes and the zip codes of the corresponding region.

Question 1 Contents
(a) Most Expensive Zipcode Regions of the City
(b) Least Expensive Zipcode Regions of the City
(c) Map Visualization of Median Home Price per Zipcode Region

In [7]:
prices_per_zip = pd.read_sql("SELECT RegionName, MedianSoldPricePerSqft_AllHomes FROM zillow WHERE MedianSoldPricePerSqft_AllHomes>0 GROUP BY RegionName", db)
prices_per_zip = prices_per_zip.sort_values(by="MedianSoldPricePerSqft_AllHomes", ascending=False)
top3_bottom3 = prices_per_zip.iloc[[0, 1, 2, len(prices_per_zip)-3, len(prices_per_zip)-2, len(prices_per_zip)-1]]
top3_bottom3
Out[7]:
RegionName MedianSoldPricePerSqft_AllHomes
3 94104 1336.128581
9 94111 901.804100
23 94133 850.666667
24 94134 351.758200
19 94124 312.412600
25 94158 273.033708

(a) Most Expensive Zipcode Regions of the City


94104: Downtown, the Financial District
94111: Embaracdero
94133: North Beach

As a San Francisco native, I am no alien to the bustle and lure of the downtown area. The Financial District is the center of business in downtown, decorated with the tallest skyscrapers in the city. As expected, the type of housing in this area is loft and apartment style in tall, glamorous buildings suitable for the monied professionals that work in the area.

The Embaracdero, one of the first neighborhoods in sight as you enter the city through the Bay Bridge, is one of the city's most popular tourist areas, boasting a beautiful view of the newly built Bay Bridge, a concert grounds, and museums along the walk of the coast. The apartments in that area also high rise with a timeless feel which are unsurprisingly expensive.

North Beach, not far off from the two neighborhoods previously mentioned packs in the same San Franciscan integrity, but moreso in large concrete homes than in multistory apartment buildings. Easy access to downtown facilities and the homes being large would definitely make the area more expensive.

(b) Least Expensive Zipcode Regions of the City


94134: Visitacion Valley
94124: Hunter's Point
94158: Mission Bay

Visitacion Valley "the Viz" is the neighborhood in which I was born and raised. The area is known for violence and unkept streets. Growing up, my mom frequently had to clean grafitti off our walls (I do think that San Franciscan graffiti is a charm, it was not pretty on our house), and we were not unaccustomed to car break ins and crime in the area. My father was the manager in the Walgreens in the Viz, and he was riddled with catching shoplifters on a daily basis. Despite the homey look and feel of the Viz, the amount of crime makes it unappealing to the bougie newcomers.

Hunter's Point and Mission Bay are both known for their low-income housing. However, due to the gentrification (or redevelopment, if you're unfortunately on the wrong political side of this issue), prices on homes even in these areas are pushing more and more San Francisco natives out of the city.

(c) Map Visualization of Median Home Price Per Zipcode Regions

In [8]:
# LOAD IN ZIP CODES
zips = gpd.read_file("zip/cb_2015_us_zcta510_500k.shp")
zips.head()

# NARROW IT DOWN SOME
zips = zips[zips.ZCTA5CE10.str.startswith("9")]

# WE ONLY WANT SF ZIPCODES
sf_zips = [str(x) for x in prices_per_zip["RegionName"].values]

# GET ONLY THE ZIPCODES WE WANT
zips = zips.loc[zips["ZCTA5CE10"].isin(set(sf_zips))]

# ADD ZIP CODES TO OTHER DATAFRAME TO ALIGN THEM
zips["RegionName"] = zips["ZCTA5CE10"]
zips = zips.reset_index().sort_values(by="RegionName")

# ONLY THE COLUMNS WE WANT
zips = zips[["RegionName", "geometry"]].reset_index(drop=True)
In [9]:
# SORT IN ORDER TO MERGE DATA FRAMES INTO ONE
prices_per_zip = prices_per_zip.reset_index(drop=True).sort_values(by="RegionName")
In [10]:
# CREATE DATA FRAME WITH GEOMETRY, REGION, AND PRICE
df_col_merged = pd.concat([zips, prices_per_zip["MedianSoldPricePerSqft_AllHomes"]], axis=1)

#REMOVE SOUTH CITY (SOUTH SAN FRANCISCO IS NOT IN SAN FRANCISCO!)
df_col_merged = df_col_merged[df_col_merged["RegionName"] != "94080"].reset_index(drop=True)

df_col_merged.head()
Out[10]:
RegionName geometry MedianSoldPricePerSqft_AllHomes
0 94102 POLYGON ((-122.429929 37.777909, -122.426829 3... 901.804100
1 94103 POLYGON ((-122.426418 37.765784, -122.426402 3... 850.666667
2 94104 POLYGON ((-122.404613 37.793565, -122.401315 3... 836.363636
3 94105 POLYGON ((-122.403456 37.787776, -122.401939 3... 777.939300
4 94107 POLYGON ((-122.40509 37.76462799999999, -122.4... 769.973545
In [11]:
df_col_merged.plot(column='MedianSoldPricePerSqft_AllHomes', cmap="RdPu")
plt.title("Median Home Price Per Zip Code")
Out[11]:
<matplotlib.text.Text at 0x1236290d0>

The most expensive zip code areas above are the darkest shade of purple. The less expensive areas are cream and pale pink.

The median value of median home price per zipcode region in San Francisco is 601.74k in according to this dataset. Below is a histogram of the median home prices per region. The horizontal line indicates the median home price of 601.74k. We see that there are still many zip codes below the median price, however there are many regions with drastically higher median home prices. Note that this is obviously a relatively high median price to other areas in California, such as Sacramento.

In [12]:
np.median(prices_per_zip["MedianSoldPricePerSqft_AllHomes"])
Out[12]:
601.74609337592449
In [55]:
median_prices = sns.barplot(x="RegionName", y="MedianSoldPricePerSqft_AllHomes", data=prices_per_zip,
            label="Total", color="lightblue")

for item in median_prices.get_xticklabels():
    item.set_rotation(45)

plt.axhline(y=np.median(prices_per_zip["MedianSoldPricePerSqft_AllHomes"]), linewidth=5, color="black", alpha=0.3)
    
plt.title("Median Price for Homes")
plt.show()

(2) Where do crimes happen most? What are the most common types of crime?

Question 2 Contents
(a) In which Police Districts do crimes happen most?
(b) What crimes are happening in this district?
(c) What are the most San Franciscan types of crime, i.e. overall most common crimes?

(a) In which Police Districts do crimes happen most?

In [14]:
# INITIALIZE DATA FRAME
crime = pd.read_sql("SELECT Descript, PdDistrict FROM crime", db)
crime.head()
Out[14]:
Descript PdDistrict
0 LOST PROPERTY MISSION
1 ROBBERY, BODILY FORCE TENDERLOIN
2 AGGRAVATED ASSAULT WITH BODILY FORCE TENDERLOIN
3 DOMESTIC VIOLENCE TENDERLOIN
4 MALICIOUS MISCHIEF, VANDALISM OF VEHICLES NORTHERN
In [15]:
# DATAFRAME OF DISTRICTS ONLY
districts = pd.DataFrame(crime["PdDistrict"].value_counts())
In [57]:
# TOTAL CRIMES PER DISTRICT
sns.barplot(x=districts.index.values, y=districts["PdDistrict"], data=districts,
            label="Total", color="orange")
plt.title("Total Crimes Per Police District")
plt.show()

<>


The Southern Police District has the most crimes.

As we can see from the map above, the Southern Police District refers to South of Market as well as Treasure Island. While SoMa is home to a lot of company offices, it is also home to some tougher communities. Once driving through SoMa, I saw a dead man in the middle of the street, all his neighbors out upon the sidewalk, having witnessed the scene. Unfortunately, I drove right to the right of this dead man because there were no Crime Scene tapes up yet. This happened to me as I was dropping off my sister to her office at Square, a tech company. I think this may give you a good idea of the SoMa climate.

Treasure Island is a less inhabited part of San Francisco where crime is probably easier to commit.

(b) What crimes are happening in the Southern Police district?

We'll take a look by day of the week for the first 20 crimes in the dataframe for Southern Police District, to keep our analyses less computationally expensive.

In [154]:
southern_crime = pd.read_sql("SELECT Descript, DayOfWeek FROM crime WHERE PdDistrict='SOUTHERN'", db)
southern_crime.head()
Out[154]:
Descript DayOfWeek
0 PETTY THEFT FROM LOCKED AUTO Sunday
1 WARRANT ARREST Wednesday
2 PETTY THEFT FROM LOCKED AUTO Sunday
3 AIDED CASE, DOG BITE Sunday
4 INVESTIGATIVE DETENTION Sunday
In [155]:
# GRAB FIRST 20 CRIMES
first20crimes = southern_crime["Descript"].values[0:20]
dow_counts = [southern_crime["DayOfWeek"][southern_crime["Descript"]==x].value_counts() for x in first20crimes]

# GET A DATAFRAME SUMMARY
southern_dow_summary = pd.DataFrame(dow_counts, top20socrimes).drop_duplicates()
southern_dow_summary
Out[155]:
Friday Monday Saturday Sunday Thursday Tuesday Wednesday
PETTY THEFT FROM LOCKED AUTO 694 501 808 636 534 457 602
WARRANT ARREST 834 814 696 684 914 911 995
AIDED CASE, DOG BITE 40 21 32 27 35 42 32
INVESTIGATIVE DETENTION 457 422 444 485 490 419 470
BATTERY 851 764 939 867 784 781 782
ATTEMPTED ROBBERY ON THE STREET WITH BODILY FORCE 38 34 31 30 21 34 29
POSSESSION OF MARIJUANA 165 149 135 132 179 144 170
AIDED CASE, MENTAL DISTURBED 1083 1011 1014 937 1024 1007 962
DRIVERS LICENSE, SUSPENDED OR REVOKED 825 893 808 628 754 875 910
GRAND THEFT FROM PERSON 443 348 411 420 438 380 399
LOST PROPERTY 1547 1096 1539 1274 1177 1050 1191
AGGRAVATED ASSAULT WITH A KNIFE 77 65 71 93 49 56 73
ROBBERY, ARMED WITH A KNIFE 19 18 24 15 12 16 16
MISSING ADULT 317 225 257 218 272 254 243
PETTY THEFT FROM A BUILDING 431 321 333 319 402 352 355
MALICIOUS MISCHIEF, BREAKING WINDOWS 160 134 140 171 129 121 116
In [156]:
# CALCULATE MOST / LEAST
most_dow = southern_dow_summary.idxmax(axis=1)
least_dow = southern_dow_summary.idxmin(axis=1)
In [240]:
# MOST, LEAST, AND ON AVERAGE DATAFRAME
southern_dow_summary_2 = pd.concat([most_dow.to_frame(), least_dow.to_frame(), southern_dow_summary.mean(axis=1).to_frame()], axis=1)
southern_dow_summary_2.columns = ["Committed Most On", "Committed Least On", "Average Committed Per Day"]
southern_dow_summary_2.sort_values(by="Average Committed Per Day", ascending=False)
Out[240]:
Committed Most On Committed Least On Average Committed Per Day
LOST PROPERTY Friday Tuesday 1267.714286
AIDED CASE, MENTAL DISTURBED Friday Sunday 1005.428571
WARRANT ARREST Wednesday Sunday 835.428571
BATTERY Saturday Monday 824.000000
DRIVERS LICENSE, SUSPENDED OR REVOKED Wednesday Sunday 813.285714
PETTY THEFT FROM LOCKED AUTO Saturday Tuesday 604.571429
INVESTIGATIVE DETENTION Thursday Tuesday 455.285714
GRAND THEFT FROM PERSON Friday Monday 405.571429
PETTY THEFT FROM A BUILDING Friday Sunday 359.000000
MISSING ADULT Friday Sunday 255.142857
POSSESSION OF MARIJUANA Thursday Sunday 153.428571
MALICIOUS MISCHIEF, BREAKING WINDOWS Sunday Wednesday 138.714286
AGGRAVATED ASSAULT WITH A KNIFE Sunday Thursday 69.142857
AIDED CASE, DOG BITE Tuesday Monday 32.714286
ATTEMPTED ROBBERY ON THE STREET WITH BODILY FORCE Friday Thursday 31.000000
ROBBERY, ARMED WITH A KNIFE Saturday Thursday 17.142857

The above dataframe summarizes 20 crimes in the Southern Police District. In this district, we see that the first 20 crimes of the dataset happen more on weekends: Fridays, Saturdays, and Sundays. It is interesting to see that posession of marijuana, mentally disturbed cases, and driving mishaps happen less on Sundays, the day that many people worship God in the city.

(c) What are the most San Franciscan types of crime, i.e. overall most common crimes?

In [158]:
# DATA FRAME OF CRIME TYPES
crime_types = pd.DataFrame(crime["Descript"].value_counts())
In [159]:
crime_types = sns.barplot(x=crime_types.index.values[0:10], y=crime_types["Descript"][0:10][0:10], data=crime_types,
            label="Total", color="lightblue")

for item in crime_types.get_xticklabels():
    item.set_rotation(90)

plt.title("Top 10 Most Frequently Committed Crimes")
plt.show()

The most common type of crime is Grand Theft from a Locked Automobile. Other common crimes have to do with lost property, bad driving, and being mentally ill.

(3) Where do homicides happen in the City?

One crime that is particularly of interest in any city is murder. Is it more likely that you may get killed walking down the street in any certain area in San Francisco rather than others? The answer is yes.

In [172]:
# READ IN HOMICIDE DATA
homicides = pd.read_sql("SELECT Descript, PdDistrict, Lon, Lat FROM crime WHERE Descript LIKE '%HOMICIDE%'", db)
homicides.head()
Out[172]:
Descript PdDistrict Lon Lat
0 ATTEMPTED HOMICIDE WITH A GUN MISSION -122.410963 37.752685
1 ATTEMPTED HOMICIDE WITH A KNIFE MISSION -122.406338 37.753004
2 ATTEMPTED HOMICIDE WITH BODILY FORCE CENTRAL -122.402862 37.804531
3 ATTEMPTED HOMICIDE WITH BODILY FORCE TENDERLOIN -122.414056 37.782793
4 ATTEMPTED HOMICIDE WITH A KNIFE BAYVIEW -122.387235 37.719401

The total number of homicides observed in this dataset is 677.

In [218]:
print "Homicides in this dataset: ", homicides.shape[0]
Homicides in this dataset:  677
In [34]:
# INITIALIZE BASEMAP
homicide_map = Basemap(projection="merc", llcrnrlon=-122.6, llcrnrlat=37.65, urcrnrlon=-122.3, urcrnrlat=37.85, resolution="h", area_thresh = 0.1)

# DRAW IN LINES AND SHAPEFILE
homicide_map.drawcoastlines()
homicide_map.drawmapboundary()
homicide_map.drawcounties()
homicide_map.readshapefile("SF Find Neighborhoods/geo_export_dd4c4138-2501-498e-86f4-06d7827b2d53", "geo_export_dd4c4138-2501-498e-86f4-06d7827b2d53")

# RECOLOR
homicide_map.fillcontinents(color="darkgrey")

# ADD SCATTER
for lon, lat, desc in zip(homicides.Lon, homicides.Lat, homicides.Descript):
    x, y = homicide_map(lon, lat)
    
    # DANGEROUS WEAPON
    color = "c"
    
    # GUN
    if desc == "ATTEMPTED HOMICIDE WITH A GUN":
        color = "r"
    
    # KNIFE
    if desc == "ATTEMPTED HOMICIDE WITH A KNIFE":
        color = "y"
    
    # BODILY FORCE
    if desc == "ATTEMPTED HOMICIDE WITH BODILY FORCE":
        color = "b"
        
    homicide_map.plot(x, y, color + "o", markersize = 5, alpha = 0.7)
    
plt.title("San Francisco Homicides")
plt.show()

Red - Homicide by Gun | Cyan - Homicide by Dangerous Weapon | Yellow - Homicide by Knife | Blue - Homicide by Bodily Force

Homicides are more frequent on the Eastern sides of San Francisco. According to this map, homicides happen more frequently in the Eastern Areas of San Francisco. This includes downtown areas, Visitacion Valley, and Mission Bay. The areas with less homicides include the Richmond and Sunset (places that are closer to the beach and have plenty of restaurants and homes) and the Lake Merced area that San Francisco State University is at.

Recall that in Question 2, we calculated that the Southern Police District had more crimes than all other districts. In this particular district alone, 93 homicides took place. That is, 13.737% of homicides happened in the Southern Police District, and also notable: none of the homicides happened on Treasure Island.

In [183]:
southern_crime_homicides = pd.read_sql("SELECT Descript FROM crime WHERE PdDistrict='SOUTHERN' and Descript LIKE '%HOMICIDE%'", db)
print "Homicides in Southern PD: ", southern_crime_homicides.shape[0]

print 93*(677**(-1))*100, "% of homicides from this dataset were in the Southern PD"
Homicides in Southern PD:  93
13.7370753323 % of homicides from this dataset were in the Southern PD

(4) What is the relationship between elementary and high schools in the City?

Let's explore! I will be defining elementary school to be schools for which take students that are in the 8th grade or lower. High schools are schools that have an upper grade of higher than 8th grade. Elementary schools will be colored in cyan blue and the high schools will be in dark blue.

In [196]:
# INITIALIZE DATA FRAME FROM SQL DATABASE
schools = pd.read_sql("SELECT * FROM schools WHERE UpperGrade < 13", db)
In [198]:
# INITIALIZE BASEMAP
school_map = Basemap(projection="merc", llcrnrlon=-122.6, llcrnrlat=37.65, urcrnrlon=-122.3, urcrnrlat=37.85, resolution="h", area_thresh = 0.1)

# DRAW IN LINES AND SHAPEFILE
school_map.drawcoastlines()
school_map.drawmapboundary()
school_map.drawcounties()
school_map.readshapefile("SF Find Neighborhoods/geo_export_dd4c4138-2501-498e-86f4-06d7827b2d53", "geo_export_dd4c4138-2501-498e-86f4-06d7827b2d53")

# RECOLOR
school_map.fillcontinents(color="darkgrey")

# ADD SCATTER
for lon, lat, upper in zip(schools.Lon, schools.Lat, schools.UpperGrade):
    x, y = school_map(lon, lat)
    
    # HIGH SCHOOLS
    color = "b"
    
    # ELEMENTARY SCHOOLS
    if upper <= 8:
        color = "c"

        
    school_map.plot(x, y, color + "o", markersize = 5, alpha = 0.8)

plt.title("Elementary and High Schools")
plt.show()

Red - High Schools | Cyan - Elementary Schools

There are obviously many more elementary schools than high schools. But what is less obvious is that there are 6 elementary schools for every 1 high school. This would suggest that enrollment in high schools is far greater than in elementary schools.

Elementary schools are more or less evenly distributed around the city, but high schools are clustered together in certain areas. That is, not all students have an equal lengthed commute after they finish elementary school. Increased classroom sizes in high schools could be an educational disadvantage to students in San Francisco, which is awfully cruel considering the home prices in the area.

In [228]:
# NUMBER OF ELEMENTARY SCHOOLS DIVIDED BY NUMBER OF HIGH SCHOOLS
schools[schools["UpperGrade"]<=8].shape[0] * ((schools[schools["UpperGrade"]>8].shape[0])**(-1))
Out[228]:
6.016129032258064

(5) What is the distribution of high schools in the City?

Below, I will plot high schools based on category of high school. High schools are defined to have a "lower grade" of 9.

In [204]:
highschools = pd.read_sql("SELECT * FROM schools WHERE LowerGrade=9 ORDER BY Category", db)
In [205]:
# INITIALIZE BASEMAP
hs_map = Basemap(projection="merc", llcrnrlon=-122.6, llcrnrlat=37.65, urcrnrlon=-122.3, urcrnrlat=37.85, resolution="h", area_thresh = 0.1)

# DRAW IN LINES AND SHAPEFILE
hs_map.drawcoastlines()
hs_map.drawmapboundary()
hs_map.drawcounties()
hs_map.readshapefile("SF Find Neighborhoods/geo_export_dd4c4138-2501-498e-86f4-06d7827b2d53", "geo_export_dd4c4138-2501-498e-86f4-06d7827b2d53")

# RECOLOR
hs_map.fillcontinents(color="darkgrey")

# ADD SCATTER
for lon, lat, category in zip(highschools.Lon, highschools.Lat, highschools.Category):
    x, y = hs_map(lon, lat)
    
    color = "k"
    
    # YELLOW PUBLIC SCHOOLS
    if category == "USD Grades 9-12":
        color = "y"
    
    # CHARTER SCHOOLS
    if category == "USD Charter School":
        color = "c"
    
    # PRIVATE SCHOOLS
    if category == "Independent / Private":
        color = "m"
        
    hs_map.plot(x, y, color + "o", markersize = 7, alpha = 0.7)

plt.title("High Schools in San Francisco by Type")
plt.show()

Magenta - Private | Cyan - USD Charter School | Yellow - USD Grades 9-12

Are high schools offered only in specific areas?

The following plot shows high schools plottted upon the map of zip codes, shaded by price of housing. What is apparent in this map is that wealthier areas have more high schools available to the area. In general, wealthier areas do not necessarily have higher high schooler populations than do areas of less wealth in San Francisco.

In [223]:
highschools_per_zip = df_col_merged.plot(column='MedianSoldPricePerSqft_AllHomes', cmap="Greens")

for lon, lat, category in zip(highschools.Lon, highschools.Lat, highschools.Category):
    x, y = (lon, lat)
    color = "k"
    
    # PRIVATE SCHOOLS
    if category == "Independent / Private":
        color = "c"
    
    highschools_per_zip.plot(x, y, color + "o", markersize = 10, alpha=0.7)

plt.title("High Schools Per Zip Code")

bounds = df_col_merged.geometry.bounds

plt.xlim([bounds.minx.min()-0.01, bounds.maxx.max()+0.005])
plt.ylim([bounds.miny.min()-0.01, bounds.maxy.max()+0.01])
Out[223]:
(37.697639885837305, 37.821440999999993)

Black - Public | Cyan - Private

There are less high schools in less wealthy areas of San Francisco.
This could be because there are a quite uniform number of private and public schools throughout the City, and that private schools only cater to areas that have higher income families.

Some areas completely lack private schools.
We can easily see that private schools are clustered near darker green map regions, meaning that higher income families have access to these schools.

There are 20 public high schools and 16 private high schools in San Francisco. There are only 1.25 public schools for every private school.
I find this statistics unimpressive. Public high schools in San Francisco are overcrowded and underfunded in general. That is, many schools lack basic programs that neighboring high school districts have such as: band programs, sex ed programs, and AP classes. The fact that private schools are established much further away from students who live in less wealthy areas in San Francisco limits their mobility in many different ways, including socioeconomic. However, it may be the case that private schools have been established in historically wealthier areas in order to flourish by donation.

In [234]:
# NUMBER OF HIGH SCHOOLS
total = highschools.shape[0]
public = highschools[highschools["Entity"]=="SFUSD"].shape[0]
print "Total number of high schools: ", total
print "Total number of public schools: ", public
print "Total number of private schools: ", total-public
Total number of high schools:  36
Total number of public schools:  20
Total number of private schools:  16
In [235]:
# PUBLIC DIVIDED BY PRIVATE
public * ((total-public)**(-1))
Out[235]:
1.25
In [236]:
# CLOSE THE DATABASE
db.close()
View Home Page | View My GitHub Profile