1 Classical Regression Modeling

To address the practical questions formulated last week, we employ both linear regression and logistic regression models, utilizing the analytical dataset created previously. This assignment builds upon last week’s exploratory data analysis (EDA) and feature engineering work and will later be combined with next week’s task on predictive modeling and cross-validation to form a comprehensive project report.

This assignment focuses on the classical regression analysis.

2 Linear Regression Models

Choose a continuous variable as a response to perform a linear regression analysis. Please use several subsections to organize your analysis that contain the following components.

  • Statement of the question(s), the purpose of this analysis: association analysis or predictive analysis?

  • Justify whether the data set has sufficient information to address the question(s)

  • Model building process: initial model, diagnostics, further transformations (in addition to the one in the EDA), key performance metrics of model assessment, and final model selection (based on appropriate performance metrics). You are expected to

    • create a model that contains a few practically important variables
    • create a model that includes additional variables that potentially influence the response
    • use certain variable selection methods to identify the optimal model (i.e., the final model)
  • Interpretation of regression coefficients. If you transformed your response variable, you need to do some algebra to convert the transformed response variable back to the original scale before you interpret the regression coefficient.

  • Summary/discussion/recommendation

You could open a subsection for each bullet point.

3 Logistic Regression Analysis

Choose a binary variable as a response to perform a logistic regression analysis. If your data set does not have a binary categorical variable that can be used for the logistic regression model, you can dichotomize a continuous response in a meaningful way and then build a logistic regression model with the dichotomized variable.

Please use several subsections to organize your analysis that contain the following components.

  • Statement of the question(s), the purpose of this analysis: association analysis or predictive analysis?

  • Justify whether the data set has sufficient information to address the question(s)

  • Model building process: initial model, diagnostics, transformation and scaling (in addition to the one in the EDA), key performance metrics of model assessment, and final model selection (based on certain performance metrics). For practice, you are expected to

    • create a model that contains a few practically important variables
    • create a model that includes additional variables that potentially influence the response
    • use certain variable selection methods to identify the optimal model (i.e., the final model)
  • The interpretation of the final model: interpret the regression coefficient and applications of the model.

  • Summary/discussion/recommendation

LS0tDQp0aXRsZTogJ1Byb2plY3QgT25lOiBQYXJ0IElJIC0gUmVncmVzc2lvbiBBbmFseXNpcycNCmF1dGhvcjogIiAoWW91IGFyZSBleHBlY3RlZCB0byBnaXZlIGEgZGVzY3JpcHRpdmUgdGl0bGUpIg0KZGF0ZTogIiAiDQpvdXRwdXQ6DQogIGh0bWxfZG9jdW1lbnQ6IA0KICAgIHRvYzogeWVzDQogICAgdG9jX2RlcHRoOiA0DQogICAgdG9jX2Zsb2F0OiB5ZXMNCiAgICBudW1iZXJfc2VjdGlvbnM6IHllcw0KICAgIHRvY19jb2xsYXBzZWQ6IHllcw0KICAgIGNvZGVfZm9sZGluZzogaGlkZQ0KICAgIGNvZGVfZG93bmxvYWQ6IHllcw0KICAgIHNtb290aF9zY3JvbGw6IHllcw0KICAgIHRoZW1lOiBsdW1lbg0KICB3b3JkX2RvY3VtZW50OiANCiAgICB0b2M6IHllcw0KICAgIHRvY19kZXB0aDogNA0KICAgIGZpZ19jYXB0aW9uOiB5ZXMNCiAgICBrZWVwX21kOiB5ZXMNCiAgcGRmX2RvY3VtZW50OiANCiAgICB0b2M6IHllcw0KICAgIHRvY19kZXB0aDogNA0KICAgIGZpZ19jYXB0aW9uOiB5ZXMNCiAgICBudW1iZXJfc2VjdGlvbnM6IHllcw0KICAgIGZpZ193aWR0aDogMw0KICAgIGZpZ19oZWlnaHQ6IDMNCmVkaXRvcl9vcHRpb25zOiANCiAgY2h1bmtfb3V0cHV0X3R5cGU6IGlubGluZQ0KLS0tDQoNCg0KYGBge2NzcywgZWNobyA9IEZBTFNFfQ0KZGl2I1RPQyBsaSB7ICAgICAvKiB0YWJsZSBvZiBjb250ZW50ICAqLw0KICAgIGxpc3Qtc3R5bGU6dXBwZXItcm9tYW47DQogICAgYmFja2dyb3VuZC1pbWFnZTpub25lOw0KICAgIGJhY2tncm91bmQtcmVwZWF0Om5vbmU7DQogICAgYmFja2dyb3VuZC1wb3NpdGlvbjowOw0KfQ0KDQpoMS50aXRsZSB7ICAgIC8qIGxldmVsIDEgaGVhZGVyIG9mIHRpdGxlICAqLw0KICBmb250LXNpemU6IDI0cHg7DQogIGZvbnQtd2VpZ2h0OiBib2xkOw0KICBjb2xvcjogRGFya1JlZDsNCiAgdGV4dC1hbGlnbjogY2VudGVyOw0KfQ0KDQpoNC5hdXRob3IgeyAvKiBIZWFkZXIgNCAtIGFuZCB0aGUgYXV0aG9yIGFuZCBkYXRhIGhlYWRlcnMgdXNlIHRoaXMgdG9vICAqLw0KICBmb250LXNpemU6IDE4cHg7DQogIGZvbnQtd2VpZ2h0OiBib2xkOw0KICBmb250LWZhbWlseTogIlRpbWVzIE5ldyBSb21hbiIsIFRpbWVzLCBzZXJpZjsNCiAgY29sb3I6IERhcmtSZWQ7DQogIHRleHQtYWxpZ246IGNlbnRlcjsNCn0NCg0KaDQuZGF0ZSB7IC8qIEhlYWRlciA0IC0gYW5kIHRoZSBhdXRob3IgYW5kIGRhdGEgaGVhZGVycyB1c2UgdGhpcyB0b28gICovDQogIGZvbnQtc2l6ZTogMThweDsNCiAgZm9udC13ZWlnaHQ6IGJvbGQ7DQogIGZvbnQtZmFtaWx5OiAiVGltZXMgTmV3IFJvbWFuIiwgVGltZXMsIHNlcmlmOw0KICBjb2xvcjogRGFya0JsdWU7DQogIHRleHQtYWxpZ246IGNlbnRlcjsNCn0NCg0KaDEgeyAvKiBIZWFkZXIgMSAtIGFuZCB0aGUgYXV0aG9yIGFuZCBkYXRhIGhlYWRlcnMgdXNlIHRoaXMgdG9vICAqLw0KICAgIGZvbnQtc2l6ZTogMjBweDsNCiAgICBmb250LXdlaWdodDogYm9sZDsNCiAgICBmb250LWZhbWlseTogIlRpbWVzIE5ldyBSb21hbiIsIFRpbWVzLCBzZXJpZjsNCiAgICBjb2xvcjogZGFya3JlZDsNCiAgICB0ZXh0LWFsaWduOiBjZW50ZXI7DQp9DQoNCmgyIHsgLyogSGVhZGVyIDIgLSBhbmQgdGhlIGF1dGhvciBhbmQgZGF0YSBoZWFkZXJzIHVzZSB0aGlzIHRvbyAgKi8NCiAgICBmb250LXNpemU6IDE4cHg7DQogICAgZm9udC13ZWlnaHQ6IGJvbGQ7DQogICAgZm9udC1mYW1pbHk6ICJUaW1lcyBOZXcgUm9tYW4iLCBUaW1lcywgc2VyaWY7DQogICAgY29sb3I6IG5hdnk7DQogICAgdGV4dC1hbGlnbjogbGVmdDsNCn0NCg0KaDMgeyAvKiBIZWFkZXIgMyAtIGFuZCB0aGUgYXV0aG9yIGFuZCBkYXRhIGhlYWRlcnMgdXNlIHRoaXMgdG9vICAqLw0KICAgIGZvbnQtc2l6ZTogMTZweDsNCiAgICBmb250LXdlaWdodDogYm9sZDsNCiAgICBmb250LWZhbWlseTogIlRpbWVzIE5ldyBSb21hbiIsIFRpbWVzLCBzZXJpZjsNCiAgICBjb2xvcjogbmF2eTsNCiAgICB0ZXh0LWFsaWduOiBsZWZ0Ow0KfQ0KDQpoNCB7IC8qIEhlYWRlciA0IC0gYW5kIHRoZSBhdXRob3IgYW5kIGRhdGEgaGVhZGVycyB1c2UgdGhpcyB0b28gICovDQogICAgZm9udC1zaXplOiAxNHB4Ow0KICBmb250LXdlaWdodDogYm9sZDsNCiAgICBmb250LWZhbWlseTogIlRpbWVzIE5ldyBSb21hbiIsIFRpbWVzLCBzZXJpZjsNCiAgICBjb2xvcjogZGFya3JlZDsNCiAgICB0ZXh0LWFsaWduOiBsZWZ0Ow0KfQ0KDQovKiBBZGQgZG90cyBhZnRlciBudW1iZXJlZCBoZWFkZXJzICovDQouaGVhZGVyLXNlY3Rpb24tbnVtYmVyOjphZnRlciB7DQogIGNvbnRlbnQ6ICIuIjsNCn0NCmBgYA0KDQpgYGB7ciBzZXR1cCwgaW5jbHVkZT1GQUxTRX0NCiMgY29kZSBjaHVuayBzcGVjaWZpZXMgd2hldGhlciB0aGUgUiBjb2RlLCB3YXJuaW5ncywgYW5kIG91dHB1dCANCiMgd2lsbCBiZSBpbmNsdWRlZCBpbiB0aGUgb3V0cHV0IGZpbGVzLg0KaWYgKCFyZXF1aXJlKCJrbml0ciIpKSB7DQogICBpbnN0YWxsLnBhY2thZ2VzKCJrbml0ciIpDQogICBsaWJyYXJ5KGtuaXRyKQ0KfQ0KaWYgKCFyZXF1aXJlKCJ0aWR5dmVyc2UiKSkgew0KICAgaW5zdGFsbC5wYWNrYWdlcygidGlkeXZlcnNlIikNCmxpYnJhcnkodGlkeXZlcnNlKQ0KfQ0KaWYgKCFyZXF1aXJlKCJHR2FsbHkiKSkgew0KICAgaW5zdGFsbC5wYWNrYWdlcygiR0dhbGx5IikNCmxpYnJhcnkoR0dhbGx5KQ0KfQ0Ka25pdHI6Om9wdHNfY2h1bmskc2V0KGVjaG8gPSBUUlVFLCAgICAgICAjIGluY2x1ZGUgY29kZSBjaHVuayBpbiB0aGUgb3V0cHV0IGZpbGUNCiAgICAgICAgICAgICAgICAgICAgICB3YXJuaW5nID0gRkFMU0UsICAgIyBzb21ldGltZXMsIHlvdSBjb2RlIG1heSBwcm9kdWNlIHdhcm5pbmcgbWVzc2FnZXMsDQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICMgeW91IGNhbiBjaG9vc2UgdG8gaW5jbHVkZSB0aGUgd2FybmluZyBtZXNzYWdlcyBpbg0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAjIHRoZSBvdXRwdXQgZmlsZS4gDQogICAgICAgICAgICAgICAgICAgICAgcmVzdWx0cyA9IFRSVUUsICAgICMgeW91IGNhbiBhbHNvIGRlY2lkZSB3aGV0aGVyIHRvIGluY2x1ZGUgdGhlIG91dHB1dA0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAjIGluIHRoZSBvdXRwdXQgZmlsZS4NCiAgICAgICAgICAgICAgICAgICAgICBtZXNzYWdlID0gRkFMU0UsDQogICAgICAgICAgICAgICAgICAgICAgY29tbWVudCA9IE5BDQogICAgICAgICAgICAgICAgICAgICAgKSAgDQpgYGANCg0KDQpcDQoNCg0KIyBDbGFzc2ljYWwgUmVncmVzc2lvbiBNb2RlbGluZw0KDQpUbyBhZGRyZXNzIHRoZSBwcmFjdGljYWwgcXVlc3Rpb25zIGZvcm11bGF0ZWQgbGFzdCB3ZWVrLCB3ZSBlbXBsb3kgKipib3RoKiogbGluZWFyIHJlZ3Jlc3Npb24gYW5kIGxvZ2lzdGljIHJlZ3Jlc3Npb24gbW9kZWxzLCB1dGlsaXppbmcgdGhlIGFuYWx5dGljYWwgZGF0YXNldCBjcmVhdGVkIHByZXZpb3VzbHkuIFRoaXMgYXNzaWdubWVudCBidWlsZHMgdXBvbiAqKmxhc3Qgd2Vla+KAmXMqKiBleHBsb3JhdG9yeSBkYXRhIGFuYWx5c2lzIChFREEpIGFuZCBmZWF0dXJlIGVuZ2luZWVyaW5nIHdvcmsgYW5kIHdpbGwgKipsYXRlcioqIGJlIGNvbWJpbmVkIHdpdGggKipuZXh0IHdlZWvigJlzKiogdGFzayBvbiBwcmVkaWN0aXZlIG1vZGVsaW5nIGFuZCBjcm9zcy12YWxpZGF0aW9uIHRvIGZvcm0gYSBjb21wcmVoZW5zaXZlIHByb2plY3QgcmVwb3J0Lg0KDQpUaGlzIGFzc2lnbm1lbnQgZm9jdXNlcyBvbiB0aGUgY2xhc3NpY2FsIHJlZ3Jlc3Npb24gYW5hbHlzaXMuDQoNCiMgTGluZWFyIFJlZ3Jlc3Npb24gTW9kZWxzDQoNCkNob29zZSBhIGNvbnRpbnVvdXMgdmFyaWFibGUgYXMgYSByZXNwb25zZSB0byBwZXJmb3JtIGEgbGluZWFyIHJlZ3Jlc3Npb24gYW5hbHlzaXMuIFBsZWFzZSB1c2Ugc2V2ZXJhbCBzdWJzZWN0aW9ucyB0byBvcmdhbml6ZSB5b3VyIGFuYWx5c2lzIHRoYXQgY29udGFpbiB0aGUgZm9sbG93aW5nIGNvbXBvbmVudHMuDQoNCiogU3RhdGVtZW50IG9mIHRoZSBxdWVzdGlvbihzKSwgdGhlIHB1cnBvc2Ugb2YgdGhpcyBhbmFseXNpczogYXNzb2NpYXRpb24gYW5hbHlzaXMgb3IgcHJlZGljdGl2ZSBhbmFseXNpcz8NCg0KKiBKdXN0aWZ5IHdoZXRoZXIgdGhlIGRhdGEgc2V0IGhhcyBzdWZmaWNpZW50IGluZm9ybWF0aW9uIHRvIGFkZHJlc3MgdGhlIHF1ZXN0aW9uKHMpDQoNCiogTW9kZWwgYnVpbGRpbmcgcHJvY2VzczogaW5pdGlhbCBtb2RlbCwgZGlhZ25vc3RpY3MsIGZ1cnRoZXIgdHJhbnNmb3JtYXRpb25zIChpbiBhZGRpdGlvbiB0byB0aGUgb25lIGluIHRoZSBFREEpLCBrZXkgcGVyZm9ybWFuY2UgbWV0cmljcyBvZiBtb2RlbCBhc3Nlc3NtZW50LCBhbmQgZmluYWwgbW9kZWwgc2VsZWN0aW9uIChiYXNlZCBvbiBhcHByb3ByaWF0ZSBwZXJmb3JtYW5jZSBtZXRyaWNzKS4gWW91IGFyZSBleHBlY3RlZCB0bw0KICArIGNyZWF0ZSBhIG1vZGVsIHRoYXQgY29udGFpbnMgYSBmZXcgcHJhY3RpY2FsbHkgaW1wb3J0YW50IHZhcmlhYmxlcw0KICArIGNyZWF0ZSBhIG1vZGVsIHRoYXQgaW5jbHVkZXMgYWRkaXRpb25hbCB2YXJpYWJsZXMgdGhhdCBwb3RlbnRpYWxseSBpbmZsdWVuY2UgdGhlIHJlc3BvbnNlDQogICsgdXNlIGNlcnRhaW4gdmFyaWFibGUgc2VsZWN0aW9uIG1ldGhvZHMgdG8gaWRlbnRpZnkgdGhlIG9wdGltYWwgbW9kZWwgKGkuZS4sIHRoZSBmaW5hbCBtb2RlbCkNCiAgDQoqIEludGVycHJldGF0aW9uIG9mIHJlZ3Jlc3Npb24gY29lZmZpY2llbnRzLiBJZiB5b3UgdHJhbnNmb3JtZWQgeW91ciByZXNwb25zZSB2YXJpYWJsZSwgeW91IG5lZWQgdG8gZG8gc29tZSBhbGdlYnJhIHRvIGNvbnZlcnQgdGhlIHRyYW5zZm9ybWVkIHJlc3BvbnNlIHZhcmlhYmxlIGJhY2sgdG8gdGhlIG9yaWdpbmFsIHNjYWxlIGJlZm9yZSB5b3UgaW50ZXJwcmV0IHRoZSByZWdyZXNzaW9uIGNvZWZmaWNpZW50Lg0KDQoqIFN1bW1hcnkvZGlzY3Vzc2lvbi9yZWNvbW1lbmRhdGlvbg0KDQpZb3UgY291bGQgb3BlbiBhIHN1YnNlY3Rpb24gZm9yIGVhY2ggYnVsbGV0IHBvaW50LiANCg0KDQojIExvZ2lzdGljIFJlZ3Jlc3Npb24gQW5hbHlzaXMNCg0KQ2hvb3NlIGEgYmluYXJ5IHZhcmlhYmxlIGFzIGEgcmVzcG9uc2UgdG8gcGVyZm9ybSBhIGxvZ2lzdGljIHJlZ3Jlc3Npb24gYW5hbHlzaXMuIElmIHlvdXIgZGF0YSBzZXQgZG9lcyBub3QgaGF2ZSBhIGJpbmFyeSBjYXRlZ29yaWNhbCB2YXJpYWJsZSB0aGF0IGNhbiBiZSB1c2VkIGZvciB0aGUgbG9naXN0aWMgcmVncmVzc2lvbiBtb2RlbCwgeW91IGNhbiBkaWNob3RvbWl6ZSBhIGNvbnRpbnVvdXMgcmVzcG9uc2UgKippbiBhIG1lYW5pbmdmdWwgd2F5KiogYW5kIHRoZW4gYnVpbGQgYSBsb2dpc3RpYyByZWdyZXNzaW9uIG1vZGVsIHdpdGggdGhlIGRpY2hvdG9taXplZCB2YXJpYWJsZS4NCg0KDQpQbGVhc2UgdXNlIHNldmVyYWwgc3Vic2VjdGlvbnMgdG8gb3JnYW5pemUgeW91ciBhbmFseXNpcyB0aGF0IGNvbnRhaW4gdGhlIGZvbGxvd2luZyBjb21wb25lbnRzLg0KDQoqIFN0YXRlbWVudCBvZiB0aGUgcXVlc3Rpb24ocyksIHRoZSBwdXJwb3NlIG9mIHRoaXMgYW5hbHlzaXM6IGFzc29jaWF0aW9uIGFuYWx5c2lzIG9yIHByZWRpY3RpdmUgYW5hbHlzaXM/DQoNCiogSnVzdGlmeSB3aGV0aGVyIHRoZSBkYXRhIHNldCBoYXMgc3VmZmljaWVudCBpbmZvcm1hdGlvbiB0byBhZGRyZXNzIHRoZSBxdWVzdGlvbihzKQ0KDQoqIE1vZGVsIGJ1aWxkaW5nIHByb2Nlc3M6IGluaXRpYWwgbW9kZWwsIGRpYWdub3N0aWNzLCB0cmFuc2Zvcm1hdGlvbiBhbmQgc2NhbGluZyAoaW4gYWRkaXRpb24gdG8gdGhlIG9uZSBpbiB0aGUgRURBKSwga2V5IHBlcmZvcm1hbmNlIG1ldHJpY3Mgb2YgbW9kZWwgYXNzZXNzbWVudCwgYW5kIGZpbmFsIG1vZGVsIHNlbGVjdGlvbiAoYmFzZWQgb24gY2VydGFpbiBwZXJmb3JtYW5jZSBtZXRyaWNzKS4gRm9yIHByYWN0aWNlLCB5b3UgYXJlIGV4cGVjdGVkIHRvDQogICsgY3JlYXRlIGEgbW9kZWwgdGhhdCBjb250YWlucyBhIGZldyBwcmFjdGljYWxseSBpbXBvcnRhbnQgdmFyaWFibGVzDQogICsgY3JlYXRlIGEgbW9kZWwgdGhhdCBpbmNsdWRlcyBhZGRpdGlvbmFsIHZhcmlhYmxlcyB0aGF0IHBvdGVudGlhbGx5IGluZmx1ZW5jZSB0aGUgcmVzcG9uc2UNCiAgKyB1c2UgY2VydGFpbiB2YXJpYWJsZSBzZWxlY3Rpb24gbWV0aG9kcyB0byBpZGVudGlmeSB0aGUgb3B0aW1hbCBtb2RlbCAoaS5lLiwgdGhlIGZpbmFsIG1vZGVsKQ0KICANCiogVGhlIGludGVycHJldGF0aW9uIG9mIHRoZSBmaW5hbCBtb2RlbDogaW50ZXJwcmV0IHRoZSByZWdyZXNzaW9uIGNvZWZmaWNpZW50IGFuZCBhcHBsaWNhdGlvbnMgb2YgdGhlIG1vZGVsLg0KDQoqIFN1bW1hcnkvZGlzY3Vzc2lvbi9yZWNvbW1lbmRhdGlvbg0KDQoNCg0KDQo=