{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "**LabML01(KNN)**\n", "\n", "The k-nearest neighbors (KNN) is easy-to-implement supervised machine learning algorithm that can be used to solve classification. The KNN algorithm uses 'feature similarity' to predict the values of any new data points. This means that the new point is assigned a value based on how closely it resembles the points in the training set.\n", "k in kNN algorithm represents the number of nearest neighbor points which are voting for the new test data’s class.\n", "\n", "K nearest neighbors or KNN Algorithm is a simple algorithm which uses the entire dataset in its training phase. Whenever a prediction is required for an unseen data instance, it searches through the entire training dataset for k-most similar instances and the data with the most similar instance is finally returned as the prediction.\n", "\n", "The similarity of the new point and the points in the training set is based on a distance measure. \n", "\n", "Add comments into the following code." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "url='https://raw.githubusercontent.com/masterfloss/data/main/jogadores.csv'\n", "df=pd.read_csv(url,sep=\";\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X1=df.loc[:,['Idade','Altura','Minutos','Valor de Mercado']]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "from sklearn.preprocessing import StandardScaler\n", "standardizer=StandardScaler()\n", "#X1Stand=standardizer.fit_transform(X1)\n", "scaler=standardizer.fit(X1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X1Stand=scaler.transform(X1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X1.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "posi=pd.get_dummies(df['Posicao'], prefix='pos')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "XArray=zip(posi['pos_ATA'],posi['pos_DEF'],posi['pos_GK'],posi['pos_MED'],X1Stand[:,0],X1Stand[:,1],X1Stand[:,2],X1Stand[:,3])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "features=list(XArray)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "from sklearn.neighbors import KNeighborsClassifier\n", "#\n", "model = KNeighborsClassifier(n_neighbors=3)\n", "#\n", "model.fit(features,df['Ser Transferido'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "# Goalkeeper, 30 years old, 1.90 m, 5100 minutes, market Value=16 ?\n", "newdata=np.array([[30,1.90,5100,16]])\n", "newvalues=scaler.transform(newdata)\n", "newvalues" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "list(newvalues)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "teste=np.concatenate((np.array([[0,0,1,0]]),newvalues), axis=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "teste" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "predicted = model.predict(teste) \n", "print(predicted)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let split the sample in training and test..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "from sklearn.model_selection import train_test_split\n", "X_train, X_test, y_train, y_test = train_test_split(features, df['Ser Transferido'], test_size=0.3)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "from sklearn.neighbors import KNeighborsClassifier\n", "knn = KNeighborsClassifier(n_neighbors=5)\n", "knn.fit(X_train, y_train)\n", "y_pred = knn.predict(X_test)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "from sklearn import metrics\n", "print(\"Accuracy:\",metrics.accuracy_score(y_test, y_pred))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0" } }, "nbformat": 4, "nbformat_minor": 2 }