{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Regression using statsmodels\n", "\n", "Author: Carlos J. Costa, ISEG\n", "\n", "Purpose: Identify the weight of several features in the home prices in Boston.\n", "\n", "**1** import libraries needed: sklearn and pandas\n", "\n", "**2** use boston dataset from https://scikit-learn.org/stable/datasets/index.html and convert into two dataframes: X for the features and Y for the target.\n", "\n", "**3** Verify 5 lines of the features variables\n", "\n", "**4** Veify datatype\n", "\n", "**5** Create new features variables with only 2 variables\n", "\n", "**6** Create and fit the model:model = sm.OLS(target,sm.add_constant(feature)).fit()\n", "\n", "**7** Obtain summary\n", "\n", "**8** What's the differnece between model.summary2().tables[2] and model.summary().tables[1]\n", "\n", "Comment the following code:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "\n", "#\n", "import pandas as pd\n", "import statsmodels.api as sm\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "#\n", "from sklearn.datasets import load_boston\n", "boston =load_boston()\n", "X = pd.DataFrame(boston.data, columns=boston.feature_names)\n", "Y = pd.DataFrame(boston.target)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CRIMZNINDUSCHASNOXRMAGEDISRADTAXPTRATIOBLSTAT
00.0063218.02.310.00.5386.57565.24.09001.0296.015.3396.904.98
10.027310.07.070.00.4696.42178.94.96712.0242.017.8396.909.14
20.027290.07.070.00.4697.18561.14.96712.0242.017.8392.834.03
30.032370.02.180.00.4586.99845.86.06223.0222.018.7394.632.94
40.069050.02.180.00.4587.14754.26.06223.0222.018.7396.905.33
\n", "
" ], "text/plain": [ " CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX \\\n", "0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0 \n", "1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0 \n", "2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0 \n", "3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0 \n", "4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0 \n", "\n", " PTRATIO B LSTAT \n", "0 15.3 396.90 4.98 \n", "1 17.8 396.90 9.14 \n", "2 17.8 392.83 4.03 \n", "3 18.7 394.63 2.94 \n", "4 18.7 396.90 5.33 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "X.head()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "CRIM float64\n", "ZN float64\n", "INDUS float64\n", "CHAS float64\n", "NOX float64\n", "RM float64\n", "AGE float64\n", "DIS float64\n", "RAD float64\n", "TAX float64\n", "PTRATIO float64\n", "B float64\n", "LSTAT float64\n", "dtype: object" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "X.dtypes" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "#\n", "X1 = X.iloc[:,0:2]" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "#\n", "X1=sm.add_constant(X1)\n", "model = sm.OLS(Y,X1).fit()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
OLS Regression Results
Dep. Variable: 0 R-squared: 0.234
Model: OLS Adj. R-squared: 0.231
Method: Least Squares F-statistic: 76.82
Date: Mon, 25 Oct 2021 Prob (F-statistic): 7.68e-30
Time: 18:54:38 Log-Likelihood: -1772.8
No. Observations: 506 AIC: 3552.
Df Residuals: 503 BIC: 3564.
Df Model: 2
Covariance Type: nonrobust
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err t P>|t| [0.025 0.975]
const 22.4856 0.442 50.904 0.000 21.618 23.353
CRIM -0.3521 0.043 -8.267 0.000 -0.436 -0.268
ZN 0.1161 0.016 7.392 0.000 0.085 0.147
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Omnibus: 164.581 Durbin-Watson: 0.757
Prob(Omnibus): 0.000 Jarque-Bera (JB): 432.206
Skew: 1.625 Prob(JB): 1.40e-94
Kurtosis: 6.152 Cond. No. 32.0


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified." ], "text/plain": [ "\n", "\"\"\"\n", " OLS Regression Results \n", "==============================================================================\n", "Dep. Variable: 0 R-squared: 0.234\n", "Model: OLS Adj. R-squared: 0.231\n", "Method: Least Squares F-statistic: 76.82\n", "Date: Mon, 25 Oct 2021 Prob (F-statistic): 7.68e-30\n", "Time: 18:54:38 Log-Likelihood: -1772.8\n", "No. Observations: 506 AIC: 3552.\n", "Df Residuals: 503 BIC: 3564.\n", "Df Model: 2 \n", "Covariance Type: nonrobust \n", "==============================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "const 22.4856 0.442 50.904 0.000 21.618 23.353\n", "CRIM -0.3521 0.043 -8.267 0.000 -0.436 -0.268\n", "ZN 0.1161 0.016 7.392 0.000 0.085 0.147\n", "==============================================================================\n", "Omnibus: 164.581 Durbin-Watson: 0.757\n", "Prob(Omnibus): 0.000 Jarque-Bera (JB): 432.206\n", "Skew: 1.625 Prob(JB): 1.40e-94\n", "Kurtosis: 6.152 Cond. No. 32.0\n", "==============================================================================\n", "\n", "Notes:\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", "\"\"\"" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#\n", "model.summary()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 2 }