Having had chance to think about, and articulate some ideas as to how to deal with my data set, I started dividing it up into blogs posts by year.  I like using Pandas for Python, although it can be difficult to find help with it that is pitched at the right level.  Anyway, I separated out all the year from 2004 to 2017 and saved them in individual .csv files.

Than I had a go at clustering posts from 2017.  With ‘only’ 230 blog posts, this was relatively easy in terms of processing using the hardware available on my laptop.  I stuck with 10 clusters as I’d used this arbitrary number when I…

