Introduction
This section is expected to address the following information:
Some background information about the project: objective and
motivation.
Introduction to the working data set: sample size information
about data collection, variables description (names,
description/definition/type), etc.
Clear statements of questions to be addressed (make sure both
linear and logistic regression modeling will be used to address the
questions).
Methodology
Since this is a comprehensive analysis report, you are expected to
use a standalone section to outline the methods and models that will be
used to address the questions. Note that you are expected to write
several narrative paragraphs to describe each of the potential models
and their assumptions.
You can create subsections to describe individual
models/algorithms.
EDA and Feature
Engineering
Feature engineering based on EDA. Other algorithm-based engineering
methods are optional. If you have prior experience with any model-based
algorithms, please feel free to use them to make powerful variables.
Since this is a project focusing on statistical analysis, you need to
consider the interpretation of regression models (i.e., coefficients).
When using transformations, you need to think about the interpretation
of the model.
Handling Missing
Value
If your data set has missing values, you need to use appropriate
imputation methods to handle missing values.
Single Variable
Distribution
Based on observed patterns from the visualizations, you may take
appropriate actions such as regrouping, binning, discretization, etc. to
make more powerful variables for subsequent models and algorithms.
Assessing Pairwised
Relationship
Through observing the potential correlations decide whether to drop
highly correlated variable(s).
Linear Regression
Modeling
You are expected to follow the general model-building process to
search for the final model to address related questions. Please include
any visual representations whenever possible (see how visualizations
were included in lecture notes).
Create Candidate
Models
See the section of the project guidelines.
Use Cross-validation
for Model Selection
Use cross-validation and MSE as a predictive performance measure to
select the best model.
Results and
Conclusions
Report results based on the final model and conclude the statement of
questions.
Logistic
Regression
Use the same model-building process as you did in the linear
regression model to search for the best logistic regression model for
prediction.
Create Candidate
Models
See the section in the project guidelines.
Model Selection
Using ROC curve and AUC to compare candidate models. Both ROC curves
and AUC need to be included in the report
Cut-off Probability
Search
Identify the optimal cut-off probability that yields the best
prediction accuracy,
Results and
Conclusion
Report the results and conclusion of the logistic regression.
Summary and
Discussion
Summary of the project and discuss the strengths and weakness of the
methods and potential improvements.
Reference and
Appendix
List all references cited in the project. Add any appendices you may
have to support any arguments in the report.
LS0tDQp0aXRsZTogJ1Byb2plY3QgVGl0bGUnDQphdXRob3I6ICIgKFlvdSBhcmUgZXhwZWN0ZWQgdG8gZ2l2ZSBhIGRlc2NyaXB0aXZlIHRpdGxlKSINCmRhdGU6ICIgIg0Kb3V0cHV0Og0KICBodG1sX2RvY3VtZW50OiANCiAgICB0b2M6IHllcw0KICAgIHRvY19kZXB0aDogNA0KICAgIHRvY19mbG9hdDogeWVzDQogICAgbnVtYmVyX3NlY3Rpb25zOiB5ZXMNCiAgICB0b2NfY29sbGFwc2VkOiB5ZXMNCiAgICBjb2RlX2ZvbGRpbmc6IGhpZGUNCiAgICBjb2RlX2Rvd25sb2FkOiB5ZXMNCiAgICBzbW9vdGhfc2Nyb2xsOiB5ZXMNCiAgICB0aGVtZTogbHVtZW4NCiAgd29yZF9kb2N1bWVudDogDQogICAgdG9jOiB5ZXMNCiAgICB0b2NfZGVwdGg6IDQNCiAgICBmaWdfY2FwdGlvbjogeWVzDQogICAga2VlcF9tZDogeWVzDQogIHBkZl9kb2N1bWVudDogDQogICAgdG9jOiB5ZXMNCiAgICB0b2NfZGVwdGg6IDQNCiAgICBmaWdfY2FwdGlvbjogeWVzDQogICAgbnVtYmVyX3NlY3Rpb25zOiB5ZXMNCiAgICBmaWdfd2lkdGg6IDMNCiAgICBmaWdfaGVpZ2h0OiAzDQplZGl0b3Jfb3B0aW9uczogDQogIGNodW5rX291dHB1dF90eXBlOiBpbmxpbmUNCi0tLQ0KDQpgYGB7PWh0bWx9DQoNCjxzdHlsZSB0eXBlPSJ0ZXh0L2NzcyI+DQoNCi8qIENhc2NhZGluZyBTdHlsZSBTaGVldHMgKENTUykgaXMgYSBzdHlsZXNoZWV0IGxhbmd1YWdlIHVzZWQgdG8gZGVzY3JpYmUgdGhlIHByZXNlbnRhdGlvbiBvZiBhIGRvY3VtZW50IHdyaXR0ZW4gaW4gSFRNTCBvciBYTUwuIGl0IGlzIGEgc2ltcGxlIG1lY2hhbmlzbSBmb3IgYWRkaW5nIHN0eWxlIChlLmcuLCBmb250cywgY29sb3JzLCBzcGFjaW5nKSB0byBXZWIgZG9jdW1lbnRzLiAqLw0KDQpoMS50aXRsZSB7ICAvKiBUaXRsZSAtIGZvbnQgc3BlY2lmaWNhdGlvbnMgb2YgdGhlIHJlcG9ydCB0aXRsZSAqLw0KICBmb250LXNpemU6IDI0cHg7DQogIGZvbnQtd2VpZ2h0OiBib2xkOw0KICBjb2xvcjogRGFya1JlZDsNCiAgdGV4dC1hbGlnbjogY2VudGVyOw0KICBmb250LWZhbWlseTogIkdpbGwgU2FucyIsIHNhbnMtc2VyaWY7DQp9DQpoNC5hdXRob3IgeyAvKiBIZWFkZXIgNCAtIGZvbnQgc3BlY2lmaWNhdGlvbnMgZm9yIGF1dGhvcnMgICovDQogIGZvbnQtc2l6ZTogMThweDsNCiAgZm9udC1mYW1pbHk6IHN5c3RlbS11aTsNCiAgY29sb3I6IHJlZDsNCiAgdGV4dC1hbGlnbjogY2VudGVyOw0KfQ0KaDQuZGF0ZSB7IC8qIEhlYWRlciA0IC0gZm9udCBzcGVjaWZpY2F0aW9ucyBmb3IgdGhlIGRhdGUgICovDQogIGZvbnQtc2l6ZTogMThweDsNCiAgZm9udC1mYW1pbHk6IHN5c3RlbS11aTsNCiAgY29sb3I6IERhcmtCbHVlOw0KICB0ZXh0LWFsaWduOiBjZW50ZXI7DQogIGZvbnQtd2VpZ2h0OiBib2xkOw0KfQ0KaDEgeyAvKiBIZWFkZXIgMSAtIGZvbnQgc3BlY2lmaWNhdGlvbnMgZm9yIGxldmVsIDEgc2VjdGlvbiB0aXRsZSAgKi8NCiAgICBmb250LXNpemU6IDIycHg7DQogICAgZm9udC1mYW1pbHk6ICJUaW1lcyBOZXcgUm9tYW4iLCBUaW1lcywgc2VyaWY7DQogICAgY29sb3I6IG5hdnk7DQogICAgdGV4dC1hbGlnbjogY2VudGVyOw0KICAgIGZvbnQtd2VpZ2h0OiBib2xkOw0KfQ0KaDIgeyAvKiBIZWFkZXIgMiAtIGZvbnQgc3BlY2lmaWNhdGlvbnMgZm9yIGxldmVsIDIgc2VjdGlvbiB0aXRsZSAqLw0KICAgIGZvbnQtc2l6ZTogMjBweDsNCiAgICBmb250LWZhbWlseTogIlRpbWVzIE5ldyBSb21hbiIsIFRpbWVzLCBzZXJpZjsNCiAgICBjb2xvcjogbmF2eTsNCiAgICB0ZXh0LWFsaWduOiBsZWZ0Ow0KICAgIGZvbnQtd2VpZ2h0OiBib2xkOw0KfQ0KDQpoMyB7IC8qIEhlYWRlciAzIC0gZm9udCBzcGVjaWZpY2F0aW9ucyBvZiBsZXZlbCAzIHNlY3Rpb24gdGl0bGUgICovDQogICAgZm9udC1zaXplOiAxOHB4Ow0KICAgIGZvbnQtZmFtaWx5OiAiVGltZXMgTmV3IFJvbWFuIiwgVGltZXMsIHNlcmlmOw0KICAgIGNvbG9yOiBuYXZ5Ow0KICAgIHRleHQtYWxpZ246IGxlZnQ7DQp9DQoNCmg0IHsgLyogSGVhZGVyIDQgLSBmb250IHNwZWNpZmljYXRpb25zIG9mIGxldmVsIDQgc2VjdGlvbiB0aXRsZSAgKi8NCiAgICBmb250LXNpemU6IDE4cHg7DQogICAgZm9udC1mYW1pbHk6ICJUaW1lcyBOZXcgUm9tYW4iLCBUaW1lcywgc2VyaWY7DQogICAgY29sb3I6IGRhcmtyZWQ7DQogICAgdGV4dC1hbGlnbjogbGVmdDsNCn0NCg0KYm9keSB7IGJhY2tncm91bmQtY29sb3I6d2hpdGU7IH0NCg0KLmhpZ2hsaWdodG1lIHsgYmFja2dyb3VuZC1jb2xvcjp5ZWxsb3c7IH0NCg0KcCB7IGJhY2tncm91bmQtY29sb3I6d2hpdGU7IH0NCg0KPC9zdHlsZT4NCmBgYA0KDQpgYGB7ciBzZXR1cCwgaW5jbHVkZT1GQUxTRX0NCiMgY29kZSBjaHVuayBzcGVjaWZpZXMgd2hldGhlciB0aGUgUiBjb2RlLCB3YXJuaW5ncywgYW5kIG91dHB1dCANCiMgd2lsbCBiZSBpbmNsdWRlZCBpbiB0aGUgb3V0cHV0IGZpbGVzLg0KaWYgKCFyZXF1aXJlKCJrbml0ciIpKSB7DQogICBpbnN0YWxsLnBhY2thZ2VzKCJrbml0ciIpDQogICBsaWJyYXJ5KGtuaXRyKQ0KfQ0KaWYgKCFyZXF1aXJlKCJ0aWR5dmVyc2UiKSkgew0KICAgaW5zdGFsbC5wYWNrYWdlcygidGlkeXZlcnNlIikNCmxpYnJhcnkodGlkeXZlcnNlKQ0KfQ0KaWYgKCFyZXF1aXJlKCJHR2FsbHkiKSkgew0KICAgaW5zdGFsbC5wYWNrYWdlcygiR0dhbGx5IikNCmxpYnJhcnkoR0dhbGx5KQ0KfQ0Ka25pdHI6Om9wdHNfY2h1bmskc2V0KGVjaG8gPSBUUlVFLCAgICAgICAjIGluY2x1ZGUgY29kZSBjaHVuayBpbiB0aGUgb3V0cHV0IGZpbGUNCiAgICAgICAgICAgICAgICAgICAgICB3YXJuaW5nID0gRkFMU0UsICAgIyBzb21ldGltZXMsIHlvdSBjb2RlIG1heSBwcm9kdWNlIHdhcm5pbmcgbWVzc2FnZXMsDQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICMgeW91IGNhbiBjaG9vc2UgdG8gaW5jbHVkZSB0aGUgd2FybmluZyBtZXNzYWdlcyBpbg0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAjIHRoZSBvdXRwdXQgZmlsZS4gDQogICAgICAgICAgICAgICAgICAgICAgcmVzdWx0cyA9IFRSVUUsICAgICMgeW91IGNhbiBhbHNvIGRlY2lkZSB3aGV0aGVyIHRvIGluY2x1ZGUgdGhlIG91dHB1dA0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAjIGluIHRoZSBvdXRwdXQgZmlsZS4NCiAgICAgICAgICAgICAgICAgICAgICBtZXNzYWdlID0gRkFMU0UsDQogICAgICAgICAgICAgICAgICAgICAgY29tbWVudCA9IE5BDQogICAgICAgICAgICAgICAgICAgICAgKSAgDQpgYGANCg0KDQpcDQoNCiMgSW50cm9kdWN0aW9uDQoNCg0KVGhpcyBzZWN0aW9uIGlzIGV4cGVjdGVkIHRvIGFkZHJlc3MgdGhlIGZvbGxvd2luZyBpbmZvcm1hdGlvbjoNCg0KKiBTb21lIGJhY2tncm91bmQgaW5mb3JtYXRpb24gYWJvdXQgdGhlIHByb2plY3Q6IG9iamVjdGl2ZSBhbmQgbW90aXZhdGlvbi4NCg0KKiBJbnRyb2R1Y3Rpb24gdG8gdGhlIHdvcmtpbmcgZGF0YSBzZXQ6IHNhbXBsZSBzaXplIGluZm9ybWF0aW9uIGFib3V0IGRhdGEgY29sbGVjdGlvbiwgdmFyaWFibGVzIGRlc2NyaXB0aW9uIChuYW1lcywgZGVzY3JpcHRpb24vZGVmaW5pdGlvbi90eXBlKSwgZXRjLg0KDQoqIENsZWFyIHN0YXRlbWVudHMgb2YgcXVlc3Rpb25zIHRvIGJlIGFkZHJlc3NlZCAobWFrZSBzdXJlIGJvdGggbGluZWFyIGFuZCBsb2dpc3RpYyByZWdyZXNzaW9uIG1vZGVsaW5nIHdpbGwgYmUgdXNlZCB0byBhZGRyZXNzIHRoZSBxdWVzdGlvbnMpLg0KDQojIE1ldGhvZG9sb2d5DQoNClNpbmNlIHRoaXMgaXMgYSBjb21wcmVoZW5zaXZlIGFuYWx5c2lzIHJlcG9ydCwgeW91IGFyZSBleHBlY3RlZCB0byB1c2UgYSBzdGFuZGFsb25lIHNlY3Rpb24gdG8gb3V0bGluZSB0aGUgbWV0aG9kcyBhbmQgbW9kZWxzIHRoYXQgd2lsbCBiZSB1c2VkIHRvIGFkZHJlc3MgdGhlIHF1ZXN0aW9ucy4gTm90ZSB0aGF0IHlvdSBhcmUgZXhwZWN0ZWQgdG8gd3JpdGUgc2V2ZXJhbCBuYXJyYXRpdmUgcGFyYWdyYXBocyB0byBkZXNjcmliZSBlYWNoIG9mIHRoZSBwb3RlbnRpYWwgbW9kZWxzIGFuZCB0aGVpciBhc3N1bXB0aW9ucy4NCg0KWW91IGNhbiBjcmVhdGUgc3Vic2VjdGlvbnMgdG8gZGVzY3JpYmUgaW5kaXZpZHVhbCBtb2RlbHMvYWxnb3JpdGhtcy4NCg0KIyBFREEgYW5kIEZlYXR1cmUgRW5naW5lZXJpbmcNCg0KRmVhdHVyZSBlbmdpbmVlcmluZyBiYXNlZCBvbiBFREEuIE90aGVyIGFsZ29yaXRobS1iYXNlZCBlbmdpbmVlcmluZyBtZXRob2RzIGFyZSBvcHRpb25hbC4gSWYgeW91IGhhdmUgcHJpb3IgZXhwZXJpZW5jZSB3aXRoIGFueSBtb2RlbC1iYXNlZCBhbGdvcml0aG1zLCBwbGVhc2UgZmVlbCBmcmVlIHRvIHVzZSB0aGVtIHRvIG1ha2UgcG93ZXJmdWwgdmFyaWFibGVzLiBTaW5jZSB0aGlzIGlzIGEgcHJvamVjdCBmb2N1c2luZyBvbiBzdGF0aXN0aWNhbCBhbmFseXNpcywgeW91IG5lZWQgdG8gY29uc2lkZXIgdGhlIGludGVycHJldGF0aW9uIG9mIHJlZ3Jlc3Npb24gbW9kZWxzIChpLmUuLCBjb2VmZmljaWVudHMpLiBXaGVuIHVzaW5nIHRyYW5zZm9ybWF0aW9ucywgeW91IG5lZWQgdG8gdGhpbmsgYWJvdXQgdGhlIGludGVycHJldGF0aW9uIG9mIHRoZSBtb2RlbC4NCg0KIyMgSGFuZGxpbmcgTWlzc2luZyBWYWx1ZSANCg0KSWYgeW91ciBkYXRhIHNldCBoYXMgbWlzc2luZyB2YWx1ZXMsIHlvdSBuZWVkIHRvIHVzZSBhcHByb3ByaWF0ZSBpbXB1dGF0aW9uIG1ldGhvZHMgdG8gaGFuZGxlIG1pc3NpbmcgdmFsdWVzLg0KDQojIyBTaW5nbGUgVmFyaWFibGUgRGlzdHJpYnV0aW9uDQoNCkJhc2VkIG9uIG9ic2VydmVkIHBhdHRlcm5zIGZyb20gdGhlIHZpc3VhbGl6YXRpb25zLCB5b3UgbWF5IHRha2UgYXBwcm9wcmlhdGUgYWN0aW9ucyBzdWNoIGFzIHJlZ3JvdXBpbmcsIGJpbm5pbmcsIGRpc2NyZXRpemF0aW9uLCBldGMuIHRvIG1ha2UgbW9yZSBwb3dlcmZ1bCB2YXJpYWJsZXMgZm9yIHN1YnNlcXVlbnQgbW9kZWxzIGFuZCBhbGdvcml0aG1zLiAgDQoNCiMjIEFzc2Vzc2luZyBQYWlyd2lzZWQgUmVsYXRpb25zaGlwDQoNClRocm91Z2ggb2JzZXJ2aW5nIHRoZSBwb3RlbnRpYWwgY29ycmVsYXRpb25zIGRlY2lkZSB3aGV0aGVyIHRvIGRyb3AgaGlnaGx5IGNvcnJlbGF0ZWQgdmFyaWFibGUocykuIA0KDQoNCiMgTGluZWFyIFJlZ3Jlc3Npb24gTW9kZWxpbmcNCg0KWW91IGFyZSBleHBlY3RlZCB0byBmb2xsb3cgdGhlIGdlbmVyYWwgbW9kZWwtYnVpbGRpbmcgcHJvY2VzcyB0byBzZWFyY2ggZm9yIHRoZSBmaW5hbCBtb2RlbCB0byBhZGRyZXNzIHJlbGF0ZWQgcXVlc3Rpb25zLiBQbGVhc2UgaW5jbHVkZSBhbnkgdmlzdWFsIHJlcHJlc2VudGF0aW9ucyB3aGVuZXZlciBwb3NzaWJsZSAoc2VlIGhvdyB2aXN1YWxpemF0aW9ucyB3ZXJlIGluY2x1ZGVkIGluIGxlY3R1cmUgbm90ZXMpLg0KDQojIyBDcmVhdGUgQ2FuZGlkYXRlIE1vZGVscw0KDQpTZWUgdGhlIHNlY3Rpb24gb2YgdGhlIHByb2plY3QgZ3VpZGVsaW5lcy4NCg0KIyMgVXNlIENyb3NzLXZhbGlkYXRpb24gZm9yIE1vZGVsIFNlbGVjdGlvbg0KDQpVc2UgY3Jvc3MtdmFsaWRhdGlvbiBhbmQgTVNFIGFzIGEgcHJlZGljdGl2ZSBwZXJmb3JtYW5jZSBtZWFzdXJlIHRvIHNlbGVjdCB0aGUgYmVzdCBtb2RlbC4NCg0KIyMgUmVzdWx0cyBhbmQgQ29uY2x1c2lvbnMNCg0KUmVwb3J0IHJlc3VsdHMgYmFzZWQgb24gdGhlIGZpbmFsIG1vZGVsIGFuZCBjb25jbHVkZSB0aGUgc3RhdGVtZW50IG9mIHF1ZXN0aW9ucy4NCg0KDQojIExvZ2lzdGljIFJlZ3Jlc3Npb24NCg0KVXNlIHRoZSBzYW1lIG1vZGVsLWJ1aWxkaW5nIHByb2Nlc3MgYXMgeW91IGRpZCBpbiB0aGUgbGluZWFyIHJlZ3Jlc3Npb24gbW9kZWwgdG8gc2VhcmNoIGZvciB0aGUgYmVzdCBsb2dpc3RpYyByZWdyZXNzaW9uIG1vZGVsIGZvciBwcmVkaWN0aW9uLg0KDQojIyBDcmVhdGUgQ2FuZGlkYXRlIE1vZGVscw0KDQpTZWUgdGhlIHNlY3Rpb24gaW4gdGhlIHByb2plY3QgZ3VpZGVsaW5lcy4NCg0KIyMgTW9kZWwgU2VsZWN0aW9uDQoNClVzaW5nIFJPQyBjdXJ2ZSBhbmQgQVVDIHRvIGNvbXBhcmUgY2FuZGlkYXRlIG1vZGVscy4gQm90aCBST0MgY3VydmVzIGFuZCBBVUMgbmVlZCB0byBiZSBpbmNsdWRlZCBpbiB0aGUgcmVwb3J0DQoNCiMjIEN1dC1vZmYgUHJvYmFiaWxpdHkgU2VhcmNoDQoNCklkZW50aWZ5IHRoZSBvcHRpbWFsIGN1dC1vZmYgcHJvYmFiaWxpdHkgdGhhdCB5aWVsZHMgdGhlIGJlc3QgcHJlZGljdGlvbiBhY2N1cmFjeSwNCg0KIyMgUmVzdWx0cyBhbmQgQ29uY2x1c2lvbg0KDQpSZXBvcnQgdGhlIHJlc3VsdHMgYW5kIGNvbmNsdXNpb24gb2YgdGhlIGxvZ2lzdGljIHJlZ3Jlc3Npb24uDQoNCiMgU3VtbWFyeSBhbmQgRGlzY3Vzc2lvbg0KDQpTdW1tYXJ5IG9mIHRoZSBwcm9qZWN0IGFuZCBkaXNjdXNzIHRoZSBzdHJlbmd0aHMgYW5kIHdlYWtuZXNzIG9mIHRoZSBtZXRob2RzIGFuZCBwb3RlbnRpYWwgaW1wcm92ZW1lbnRzLg0KDQojIFJlZmVyZW5jZSBhbmQgQXBwZW5kaXgNCg0KTGlzdCBhbGwgcmVmZXJlbmNlcyBjaXRlZCBpbiB0aGUgcHJvamVjdC4gQWRkIGFueSBhcHBlbmRpY2VzIHlvdSBtYXkgaGF2ZSB0byBzdXBwb3J0IGFueSBhcmd1bWVudHMgaW4gdGhlIHJlcG9ydC4NCg0KDQoNCg==