Standard track bootcampΒΆ

We will start by quickly working through most of the

Then we will revisit the

First we will examine the download script:

$ wget http://jarrodmillman.com/capstone/code/fetch_senator_tweets.py

or

$ curl -OL http://jarrodmillman.com/capstone/code/fetch_senator_tweets.py

Then we will review exercise solutions:

$ wget http://jarrodmillman.com/capstone/code/senators.py
$ curl -OL http://jarrodmillman.com/capstone/code/senators.py

We will conclude by tokenizing the tweets.

  1. Make a list of lists where the outer list represents senators and the inner list contains each senator’s tweets, call it tweets_list

  2. Write a function that takes tweet and returns a cleaned up version of the tweet. Here is some example code to get you started:

    >>> def clean(tweet):
    ...     cleaned_words = [word.lower() for word in tweet.split() if
    ...                      'http' not in word and
    ...                      word.isalpha() and
    ...                      word != 'RT']
    ...     return ' '.join(cleaned_words)
    ...
    
  3. Write a function, called all_punct, which takes a word and returns a bool indicating whether all the characters are punctuation marks.

  4. Write a function, called remove_punct, which takes a word and returns the word with all punctuations characters removed.

  5. Create a list, called stopwords, which contains common english words. You may want to use this list:

  6. Write a function, called tokenize, which takes a tweet cleans it as well as removes all punctuation and stopwords.

  7. Create a list of lists, tweets_list, using your tokenize function.

  8. Create a list, tokens_list, where each senator’s tweets are made into a single string.

  9. Create a list of words with duplicates.

  10. Create a sorted list of vocabulary words (no duplicates).

  11. Create a list with the most frequently used words.

Perhaps you tried something like:

$ wget http://jarrodmillman.com/capstone/code/senators2.py
$ curl -OL http://jarrodmillman.com/capstone/code/senators2.py