{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Clusters (example 01) ##\n", "\n", "\n", "Purpose: Identify clusters in a random generated blobs sample\n", "\n", "**1** import libraries needed:numpy, sklearn, matplotlib and pandas\n", "\n", "**2** generate a sample of blobs and convert it into a dataframe called df1\n", "\n", "**3** Verify datatype\n", "\n", "**4** Plot the blobs\n", "\n", "**5** calculete WCSS\n", "\n", "**6** plot the new chart with centroids\n", "\n", "**7** identify to what group does each item belongs\n", "\n", "**8** add new column\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "from matplotlib import pyplot as plt\n", "from sklearn.cluster import KMeans" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "from sklearn.datasets.samples_generator import make_blobs\n", "XY,y= make_blobs(n_samples=400, centers=5, cluster_std=0.60, random_state=0)\n", "df1=pd.DataFrame(XY)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 float64\n", "1 float64\n", "dtype: object" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1.dtypes" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(400, 2)" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1.shape" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | 0 | \n", "1 | \n", "
---|---|---|
0 | \n", "-1.379980 | \n", "7.185038 | \n", "
1 | \n", "-1.764041 | \n", "2.222230 | \n", "
2 | \n", "1.975539 | \n", "0.718989 | \n", "
3 | \n", "-1.554326 | \n", "3.050187 | \n", "
4 | \n", "1.988943 | \n", "1.509767 | \n", "