{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# lab 11\n", "\n", "***Regression Analysis***\n", "\n", "Suppose you have dataset with cars (CO2_passenger_cars2018b.csv). You what to know what features contribute to the increase of CO2 emission\n", "\n", "1 Import needed libraries\n", "\n", "2 read data from file\n", "\n", "3 list dataset showing first 5 lines. View data types\n", "\n", "4 verify possible values of categoric variable Ft\n", "\n", "5 convert Petrol to PETROL and Diesel to DIESEL\n", "\n", "6 verify again values of categoric variable Ft\n", "\n", "7 convert variables 'm (kg)','ec (cm3)','ep (KW) and 'Enedc (g/km)' to numeric\n", "\n", " If ‘raise’, then invalid parsing will raise an exception\n", " If ‘coerce’, then invalid parsing will be set as NaN\n", " If ‘ignore’, then invalid parsing will return the input\n", "\n", "8 remove all lines with NaN from df dataset and set to XY\n", "\n", "9 create a Y vector and X matrix\n", "\n", "10 create a regression model\n", "\n", "11 analyse correlation between variables. Use seaborn. Like for example in:\n", "\n", " import seaborn as sns\n", " import matplotlib.pyplot as plt\n", " fig = plt.figure(figsize=[12, 12])\n", " corr_mtx = XY.corr()\n", " sns.heatmap(corr_mtx, xticklabels=corr_mtx.columns, yticklabels=corr_mtx.columns, annot=True, cmap='Blues')\n", " plt.title('Correlation analysis')\n", " plt.show()\n", "\n", "12 convert Ft into dummy variables\n", "\n", "13 add dummy variable to a new data set XY2\n", "\n", "14 create a Y vector and X matrix\n", "\n", "15 create a regression model\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0" } }, "nbformat": 4, "nbformat_minor": 2 }