The final part of our project was to combine the two models and see if incorporating the probability of a petition being liberal or conservative would have any additional predictive power as a variable in the logistic regression model for petition success.
non_neutral=pd.read_csv("https://raw.githubusercontent.com/palautatan/project141b/master/data/no_neutral_predictions.csv",index_col="Unnamed: 0")
petitions["probability"]=petitions_prob.groupby("id").last().prob_mov_avg
petitions["title"]=petitions.title.astype(str).str.lower()
non_neutral_probs=non_neutral.merge(petitions,on="title")
non_neutral_probs.plot(kind="scatter",x="liberal",y="probability",s=30)
plt.title("Relationship between liberal score and latest probability of passing")
plt.show()
Unfortuantely, there's not a lot of analysis necessary for this graph. There appears to be virtually no relationship between political persuation and petition success (although the graph is listed as liberal, it includes conservative values as well). Data Science is a science, and like any science failure is a necessary part of the process. While we believe that there is still some value in this line of investigation, it is clear that our combined model as it is currently designed is insufficient for any practical predictive or interperative purposes.