{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lab 3\n", "\n", "This lab's purpose is to analyze Steve Jobs's speech \"Stay Hungry. Stay Foolish.\"\n", "\n", " 1. Read the file. The file may be stored in a local folder on the internet.\n", " 2. What is the type of data stored in the variable speech? \n", " 3. Remove symbols that are not useful. Suggestion: use replace method. \n", " 4. Convert the string into a list.\n", " 5. Use a dictionary to get the list of words and their frequencies.\n", " 6. Sort the dictionary alphabetically.\n", " 7. How many words does the text have?\n", " 8. What is the most frequent word?\n", " 9. Sort the words from the most frequent to the least frequent.\n", " 10. How many words have appeared more than 50 times in the text?\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " **1.** Read the file. The file may be stored in a local folder on in the internet." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import requests\n", "#url='https://raw.githubusercontent.com/masterfloss/data/main/jobs.txt'\n", "url='https://raw.githubusercontent.com/masterfloss/text/main/jobs.txt'\n", "response = requests.get(url)\n", "speech = response.text\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# an alternative if your file is stored in the current folder\n", "#f = open(\"jobs.txt\", \"r\")\n", "#speech=f.read()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " **2.** What is the type of data stored into the variable speech. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "#\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " **3.** Remove simbols that are not useful. Sugestion: use replace method. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "#\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**4.** Convert the string into a list." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "#\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**5.** Use a dictionary to get the list of words and their frequencies." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "#\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**6.** Sort the dictionary alphabetically." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "#\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**7.** How many words does the text have?" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "#\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**8.** What is the most frequent word?" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "#\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**9.** Sort the words from the most frequent to the least frequent." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "#\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**10.** How many words have appeared more than 50 times in the text?" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "#\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 2 }