{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "**LabML01(KNN)**\n", "\n", "The k-nearest neighbors (KNN) is an easy-to-implement supervised machine learning algorithm that can be used to solve classification. The KNN algorithm uses 'feature similarity' to predict the values of any new data points. This means the new point is assigned a value based on how closely it resembles the points in the training set. k in kNN algorithm represents the number of nearest neighbor points that are voting for the new test data's class.\n", "K nearest neighbors or KNN Algorithm is a simple algorithm that uses the entire dataset in its training phase. Whenever a prediction is required for an unseen data instance, it searches through the whole training dataset for k-most similar instances, and the data with the most similar instance is finally returned as the prediction.\n", "\n", "The similarity of the new point and the points in the training set is based on a distance measure. \n", "\n", "1.\tAdd comments to the following code.\n", "\n", "2.\tWrite the code using the SVM method for classification\n", "\n", "3.\tWrite the code using Naïve Bayes for classification\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "import pandas as pd\n", "url='https://raw.githubusercontent.com/masterfloss/data/main/jogadores.csv'\n", "df=pd.read_csv(url,sep=\";\")\n", "X1=df.loc[:,['Idade','Altura','Minutos','Valor de Mercado']]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "from sklearn.preprocessing import StandardScaler\n", "standardizer=StandardScaler()\n", "scaler=standardizer.fit(X1)\n", "X1Stand=scaler.transform(X1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "posi=pd.get_dummies(df['Posicao'], prefix='pos')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "XArray=zip(posi['pos_ATA'],posi['pos_DEF'],posi['pos_GK'],posi['pos_MED'],X1Stand[:,0],X1Stand[:,1],X1Stand[:,2],X1Stand[:,3])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "features=list(XArray)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "from sklearn.neighbors import KNeighborsClassifier\n", "#\n", "model = KNeighborsClassifier(n_neighbors=3)\n", "#\n", "model.fit(features,df['Ser Transferido'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "import numpy as np\n", "# Goalkeeper, 30 years old, 1.90 m, 5100 minutes, market Value=16 ?\n", "newdata=np.array([[30,1.90,5100,16]])\n", "newvalues=scaler.transform(newdata)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "teste=np.concatenate((np.array([[0,0,1,0]]),newvalues), axis=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "predicted = model.predict(teste) \n", "print(predicted)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let split the sample in training and test..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "#\n", "#\n", "#" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "from sklearn.model_selection import train_test_split\n", "from sklearn import metrics\n", "from sklearn.neighbors import KNeighborsClassifier\n", "#\n", "X_train, X_test, y_train, y_test = train_test_split(features, df['Ser Transferido'], test_size=0.3)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "knn = KNeighborsClassifier(n_neighbors=5)\n", "knn.fit(X_train, y_train)\n", "y_pred = knn.predict(X_test)\n", "print(\"Accuracy:\",metrics.accuracy_score(y_test, y_pred))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "#\n", "#\n", "#" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn import preprocessing\n", "from sklearn import metrics\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.neighbors import KNeighborsClassifier\n", "#\n", "le = preprocessing.LabelEncoder()\n", "# \n", "pos_encoded=le.fit_transform(df['Posicao'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df['Pos Encod'] = pos_encoded\n", "features=df.drop(['Posicao','Ser Transferido'],axis=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X_train, X_test, y_train, y_test = train_test_split(features, df['Ser Transferido'], test_size=0.3)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "knn = KNeighborsClassifier(n_neighbors=5)\n", "knn.fit(X_train, y_train)\n", "y_pred = knn.predict(X_test)\n", "print(\"Accuracy:\",metrics.accuracy_score(y_test, y_pred))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 2 }