{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## LabDS01 ##\n", "\n", "\n", "The following dataset includes information about several countries. This information was collected by the world bank (https://data.worldbank.org/). Data corresponds to 2016.\n", "\n", "`\n", "dataFile=\"https://raw.githubusercontent.com/masterfloss/data/main/worlddata2016.xlsx\"\n", "df=pd.read_excel(dataFile)\n", "`\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 1. Business Understanding\n", "\n", "Purpose: Analyse data and extract knowledge\n", "\n", "Observation: Country\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 2. Data Undertanding\n", "\n", "* Discuss the data and purposes\n", "* What are the main limitations?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 3. Data Preparation\n", "\n", "* Converting data to numeric\n", "* Missing Values Imputation\n", "* Normalization\n", "* Variable Dummification\n", "* Data balancing\n", "* Pivot date in order to convert the values \"Indicator Name\" into comuns \n", "* Remove all columns that have no values, df1[columns].sum()==0\n", "* Remove all columns that have more than 210 observations, df1[columns].count()<210" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0" } }, "nbformat": 4, "nbformat_minor": 2 }