This project, although hosted on my site, was a collaborative project between myself and two outstanding others.
See Patrick Vacek's profile and Graham Smith's profile for data science brilliance.

View Project Page | Previous Step: Applying Algorithm on Political Tweets | Next Step: Conclusions
Vocabulary and Model

Incorporating the liberal/conservative scores into our inference

The final part of our project was to combine the two models and see if incorporating the probability of a petition being liberal or conservative would have any additional predictive power as a variable in the logistic regression model for petition success.

In [91]:
non_neutral=pd.read_csv("https://raw.githubusercontent.com/palautatan/project141b/master/data/no_neutral_predictions.csv",index_col="Unnamed: 0")
In [130]:
petitions["probability"]=petitions_prob.groupby("id").last().prob_mov_avg
petitions["title"]=petitions.title.astype(str).str.lower()
In [131]:
non_neutral_probs=non_neutral.merge(petitions,on="title")
In [140]:
non_neutral_probs.plot(kind="scatter",x="liberal",y="probability",s=30)
plt.title("Relationship between liberal score and latest probability of passing")
plt.show()

Unfortuantely, there's not a lot of analysis necessary for this graph. There appears to be virtually no relationship between political persuation and petition success (although the graph is listed as liberal, it includes conservative values as well). Data Science is a science, and like any science failure is a necessary part of the process. While we believe that there is still some value in this line of investigation, it is clear that our combined model as it is currently designed is insufficient for any practical predictive or interperative purposes.



View Project Page | Previous Step: Applying Algorithm on Political Tweets | Next Step: Conclusions