{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "**LabML01(KNN)**\n", "\n", "The k-nearest neighbors (KNN) is easy-to-implement supervised machine learning algorithm that can be used to solve classification. The KNN algorithm uses 'feature similarity' to predict the values of any new data points. This means that the new point is assigned a value based on how closely it resembles the points in the training set.\n", "k in kNN algorithm represents the number of nearest neighbor points which are voting for the new test data’s class.\n", "\n", "K nearest neighbors or KNN Algorithm is a simple algorithm which uses the entire dataset in its training phase. Whenever a prediction is required for an unseen data instance, it searches through the entire training dataset for k-most similar instances and the data with the most similar instance is finally returned as the prediction.\n", "\n", "The similarity of the new point and the points in the training set is based on a distance measure. \n", "\n", "1. Add comments into the following code.\n", "2. Write the code using SVM method for classification\n", "3. Write the code using Naïve Bayes for classification\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "import pandas as pd\n", "url='https://raw.githubusercontent.com/masterfloss/data/main/jogadores.csv'\n", "df=pd.read_csv(url,sep=\";\")\n", "X1=df.loc[:,['Idade','Altura','Minutos','Valor de Mercado']]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "from sklearn.preprocessing import StandardScaler\n", "standardizer=StandardScaler()\n", "scaler=standardizer.fit(X1)\n", "X1Stand=scaler.transform(X1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "posi=pd.get_dummies(df['Posicao'], prefix='pos')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "XArray=zip(posi['pos_ATA'],posi['pos_DEF'],posi['pos_GK'],posi['pos_MED'],X1Stand[:,0],X1Stand[:,1],X1Stand[:,2],X1Stand[:,3])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "features=list(XArray)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "from sklearn.neighbors import KNeighborsClassifier\n", "#\n", "model = KNeighborsClassifier(n_neighbors=3)\n", "#\n", "model.fit(features,df['Ser Transferido'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "import numpy as np\n", "# Goalkeeper, 30 years old, 1.90 m, 5100 minutes, market Value=16 ?\n", "newdata=np.array([[30,1.90,5100,16]])\n", "newvalues=scaler.transform(newdata)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "teste=np.concatenate((np.array([[0,0,1,0]]),newvalues), axis=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "predicted = model.predict(teste) \n", "print(predicted)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let split the sample in training and test..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "#\n", "#\n", "#" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "from sklearn.model_selection import train_test_split\n", "from sklearn import metrics\n", "from sklearn.neighbors import KNeighborsClassifier\n", "#\n", "X_train, X_test, y_train, y_test = train_test_split(features, df['Ser Transferido'], test_size=0.3)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "knn = KNeighborsClassifier(n_neighbors=5)\n", "knn.fit(X_train, y_train)\n", "y_pred = knn.predict(X_test)\n", "print(\"Accuracy:\",metrics.accuracy_score(y_test, y_pred))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "#\n", "#\n", "#" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn import preprocessing\n", "from sklearn import metrics\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.neighbors import KNeighborsClassifier\n", "#\n", "le = preprocessing.LabelEncoder()\n", "# \n", "pos_encoded=le.fit_transform(df['Posicao'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df['Pos Encod'] = pos_encoded\n", "features=df.drop(['Posicao','Ser Transferido'],axis=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X_train, X_test, y_train, y_test = train_test_split(features, df['Ser Transferido'], test_size=0.3)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "knn = KNeighborsClassifier(n_neighbors=5)\n", "knn.fit(X_train, y_train)\n", "y_pred = knn.predict(X_test)\n", "print(\"Accuracy:\",metrics.accuracy_score(y_test, y_pred))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 2 }