{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lab 5\n", "\n", "The purpose of this lab is to analyse the speech of Steve Jobs \"Stay Hungry. Stay Foolish.\"\n", "Among other questions, we want to know many times each word is repeted in the text, " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " **1.** Read the file. The file may be stored in a local folder on in the internet." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import requests\n", "url='https://raw.githubusercontent.com/masterfloss/data/main/jobs.txt'\n", "response = requests.get(url)\n", "speech = response.text\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "f = open(\"jobs.txt\", \"r\")\n", "speech=f.read()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " **2.** What is the type of data stored into the variable speech. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "#\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " **3.** Remove simbols that are not useful. Sugestion: use replace method. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "#\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**4.** Convert the string into a list." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "#\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**5.** Use a dictionary to get the list of words and their frequencies." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "#\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**6.** Sort the dictionary alphabetically." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "#\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**7.** How many words does the text have?" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "#\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**8.** What is the most frequent word?" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "#\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**9.** Sort the words from the most frequent to the least frequent." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "#\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**10.** How many words have appeared more than 50 times in the text?" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "#\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 2 }