Stopwords | sarahtiggy2

Having successfully divided my data set up into separate years yesterday, I thought I’d go back to basics and have a look at stopwords.

in language processing, it’s apparent that that are quite a few words that absolutely no value to a text.  These are words like ‘a’, ‘all’, ‘with’ etc.  NLTK (Natural Language Tool Kit – a module that can be used to process text in various ways.  You can have a play with it here) has a list of 127 words that could be considered the most basic ones.  Scikit-learn, which I’m using for some of the more complicated text processing algorithms) uses a list of 318…

Continue reading at:

http://ift.tt/2rMH1zE

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s