{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Lab03 ##\n", "\n", "How does intelligence and education impact the level of income?\n", "\n", "In order to estimate a regression, import the statsmodels module (https://www.statsmodels.org):\n", "\n", "`\n", "import statsmodels.api as sm\n", "model = sm.OLS(y, X).fit()\n", "predictions = model.predict(X) \n", "model.summary()\n", "`\n", "* y may be a series with data corresponding to the target (or dependent variable)\n", "* X may be a dataframe with data corresponding to the features (or independent variable)\n", "\n", "Note: Information related to IQ level is not validated. Data were obtained from the Internet. On the other hand, IQ is culturally biased, and values correspond to average\n", "\n", "dataFile='https://github.com/masterfloss/data/blob/main/exerciseInt.xlsx?raw=true'\n", "\n", "1. Read data, create a dataframe anda analyse the variables (ex: basic descriptive statistics).\n", "\n", "2. Create a regression, where y is the Income, and all the others are features of the model.\n", "\n", "3. Analise output\n", "\n", "4. Create another repression, where y is the Income. IQ and 'Education expenditure per capita' are features of the model.\n", "\n", "5. Analyse relationship between Income and each one of the features using skatter plot.\n", "\n", "6. FIt a polinomial to the follwoing regression (use exercise 3):\n", "\n", " y=f(y) \n", " where:\n", " y= 'quality of Life'\n", " x= 'Education expenditures per capita'\n", "\n", "7. As alternative yoou may use the library sklearn.\n", "\n", " from sklearn.preprocessing import PolynomialFeatures\n", "\n", " polynomial_features= PolynomialFeatures(degree=3)\n", " x_poly = polynomial_features.fit_transform(x)\n", " \n", " X_poly will be the new dataframe with features.\n", "\n", "\n", "8. Show the comparision between variabes using a chart (scatter)\n", "\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "file='https://github.com/masterfloss/data/blob/main/exerciseInt.xlsx?raw=true'\n", "df=pd.read_excel(file)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "country object\n", "IQ int64\n", "Education expenditures per capita float64\n", "quality of Life float64\n", "BMI female float64\n", "BMI male float64\n", "Income float64\n", "dtype: object" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dtypes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 2 }