Are Super PACs for Hillary Clinton Astroturfing on Reddit?

A while back I did a post about tracking users from political subreddits. I wrote a quick app to pull users from a subreddit, and check what other subreddits they were commentating on. Among other things, what stood out was that Hillary Clinton’s supporters were camping out at /r/politicaldiscussion, but that the commentary at /r/politics was fairly evenly distributed between the then candidates, Bernie Sanders, Hillary Clinton, and Donald Trump.

Interestingly this week, there has been lots of talk on Reddit about the fact that a Super PAC by the name of Correct The Record has long been paying people to write pro Hillary Clinton comments on Reddit. While this has long been known, what was interesting was that people were complaining specifically that /r/Politics was the destination for many of these comments. My previous research had concluded that while it was obvious that Clinton supporters were on Reddit and doing nothing but pumping her up, that /r/Politics was actually pretty evenly keeled. Add to that the fact that it’s Trump supporters reporting this (They can get easily excited over nothing), I wanted to take another look at things.

Do /r/politics users have a tendency to be in Hilary Clinton’s camp?

politicsinrpolitics

Yes. The graph above takes a random subset of users from each political subreddit and checks whether they also comment in /r/politics You can see that 61% of users on /r/HillaryClinton also comment in /r/Politics while only 39% also commenting from r/the_donald. This is a strong change from my previous post where it was all rather even. Does this conclude that it’s astroturfing? Well maybe. But you also have to remember that in my last post, we were including Bernie Sanders. I would say a large majority of those who supported him, have now moved to supporting Clinton.

As a side note, I also included /r/enoughtrumpspam in this chart. I thought it best to include it as it’s basically a pro Hilary Clinton subreddit so is worthy of being included here. Last time we also saw Hilary Clinton supports creating /r/enoughsandersspam, so it’s not some bi-partisan call for people to stop spamming Trump memes, it’s just another pro Clinton subreddit.

Is /r/politicaldiscussion still over the top pro Hilary

politicsinpoliticaldiscussion

Yes. Although it is more likely that commentators in /r/politicaldiscussion are people not associated with any candidates subreddit, the people who are affiliated are overwhelmingly from Hillary Clintons camp.

So back to the original question. Are Super PACs for Hillary Clinton Astroturfing on Reddit? Well, the Correct The Record Super PAC has come out and said they are spending millions of dollars for people to astroturf social media anyway. So it’s not a question of IF they are doing it, but rather has it had an effect. The data shows that overwhelmingly the “neutral” political subreddits are pro Clinton which is a marked shift from my previous post where it was about even. The demographics of Reddit have always leaned to the left anyway, but it is interesting to see regardless.

Again. For more info on how politics on Reddit plays out. Read a previous post of mine here : http://mindingdata.com/2016/04/02/tracking-reddit-users-political-subreddits-failing/

Are Super PACs for Hillary Clinton Astroturfing on Reddit?

Analyzing NZ Herald’s Sources (Part 2)

Last week I published a post talking about how I had done a quick investigation into what percentage of stories on NZHerald came from syndicated sources. I threw the post up on Twitter and Reddit and had some interesting feedback. While I had only intended to take a quick look at the homepage and then move on, there were some great points made on social media about ways to improve the data. It really boiled down to two main points.

Had I used enough data? While my original intention was to just do a quick scrape, people felt that investigating 100 odd stories wasn’t enough.

And the second question, was syndication OK in some sections, but not others? People felt that syndication in some sense was OK if reporting on world news. It isn’t cheap to fly someone half way around the world for a single story. So syndication in these cases might be ok? (I end that with a question mark because it really is one of those things I’m not sure about)

Both these issues were simply enough to fix. So I made a couple of changes to my app and away we went.

First, I found that NZHerald has “archive” pages that contains about 6 days worth of articles for each category. Some days it had less, but it atleast had a a few hundred posts per category on there. So instead of going to the homepage I could just grab these pages and follow all the links to the stories. From there it was easy to write down what category the story is in so we could try and do some grouping with the results.

And as a small change to the first test. I would split out AP Wire posts from regular AP attributions. I’m still not 100% sure what a regular attribution counted for (Was it a verbatim post? Were large sections from the original? Was it just a source?), but we could be sure that AP Wire posts were automatically posted without any editing from the associated press feed. We know this because it says so right there on the page….

apwire

One note. For the majority of these stats when we talk about “syndicated” posts, we are talking about overseas syndication sources. Local NZME sources (Local newspapers and the like) don’t count. Why? Because I felt it wasn’t in the “spirit” of the discussion. This whole thing started because people felt like there was an influx of “dailymail” type stories, I don’t think locally produced stories should really count against the Herald.

So first let’s just group by their syndication “types” and see what we get.

Article Count Syndication Type
1104 Syndication Overseas
712 Herald
156 Syndication Local

OK, so right away it tells us that there is quite a bit of overseas syndicated content on the Herald. But let’s go further, let’s try and break it down by category so we can compare.

nzheraldcontentbycategory

So this starts making sense now. We see that for National issues, almost all content is from the Herald itself or from local sources (Other NZME sites). And when we get to World news, it’s mostly overseas content.

Let’s take a closer look at World to begin with. As mentioned earlier, there are really two types of syndication at play here. The AP wire which is an unedited article straight from Associated Press, and regular syndication. Let’s break it down.

heraldcontentinworldcategory

What does this tells us? Well. It tells us that there really isn’t any reason to be reading the Herald for world news over another overseas paper because none of it’s content is it’s own. In fact most of it is posted directly from the AP wire.

But let’s dive into what there seems to be universal hatred for. Daily Mail syndication. Let’s check by percentage, which category is worst hit with it.

percentageofdailymail

Take note of the Y-axis. It’s only going up to 16 – that being 16%. So even in the worst hit categories, we are “only” seeing 16% of it’s content be direct from Daily Mail. I put only in quotes because that’s still 1 in 6 articles being out and out clickbait.

How might I take this further? I think it goes back to my original post. Looking deep into a category is great, but it really depends on how the Herald is pushing it. If there is daily mail articles, but they are deep within the site and never surface to the homepage (Not the case here, but just saying), then who really cares? But what I tended to see throughout collecting this data is that it was the homepage real estate that seemed to be clogged with clickbait titles (After all, that’s what clickbait is for). So even if there is one title of “and you won’t believe what happened next” per day, if that article is a “lead” story and gets prime position on the homepage, that’s what I think is most irritating. So maybe that takes us to a part 3 in the future.

The data I used is available in CSV format here. I personally threw the data into a SQL database because my excel skills are extremely poor. I’m no data scientist so as always, other insights into the data in the comments is more than welcome!

Analyzing NZ Herald’s Sources (Part 2)

Analyzing NZ Herald’s Sources

For those outside of NZ, this post is about NZ’s largest national newspaper, “The New Zealand Herald”. If you don’t live in NZ you might not find it that interesting, but it’s still a good look into how journalism within NZ is slowly being shut down and replaced with clickbait type stories and syndicated content.

Over the past month there has been a couple of articles floating around the web, most notably this one by Russel Brown, and another piece by David Farrar. They talk about how the NZ Herald’s online edition seems to be filled up more and more by “Daily Mail” type news. Usually those stories that have a headline with “…. And what happened next will amaze you!” or “See what made *B Grade Celebrity here* cry”. On top of that I’ve began to notice that many stories listed online at the Herald are simply scraped/copy pasted articles from Associate Press or another online newspaper. Essentially making our national newspaper syndicated garbage.

At the bottom of every article you can usually find the “source” of the article. It looks a bit like this :

heraldsyndicationexample

It got me thinking. Because every article has a tag on where it came from, it should be easy to do a quick scrape of the website and tell us just how much of the Herald’s content is actually theirs, and how much is syndicated. I did think about doing a massive crawl all over the website, but it seemed easier just to pick all the front page stories and check them. So I quickly whipped up a tool to do just that. And the results will shock you! (har har)

After running my app I ended up with a set of totals that looked like this. Note that (blank) means there was no syndication marker. I believe these are from the Herald (Possibly online only stories).

Source Count
NZ Herald 62
Associated Press 36
Daily Mail 8
news.com.au 6
Bay Of Plenty Times 4
Northern Advocate 4
(blank) 4
Canvas 2
Daily Telegraph UK 2
Hawkes Bay Today 2
Washington Post 2
Herald On Sunday 1
Christchurch Star 1
The Country 1
Wanganui Chronicle 1

When we actually group “types” of sources together. We end up with something like this.

Source Count
Herald Sources 69
Local Sources 10
Other Sources 54

So we can see that the Herald itself only makes up half of it’s own content, the rest comes from either local sources or from “Other sources” such as the Daily Mail, the AP feed, or other overseas partners.

What is clear is that the Herald loves using associated press. I could be wrong, but the entire latest news section on the herald is a straight feed from AP with no editorial done on it what so ever. So all of these stories are un-edited straight syndication.

allassociatedpress

It’s kinda interesting to me because a while back, “autoblogs” used to be this big thing. Where you set up a blog and simply have it publishing 10 different feeds, not even editing the articles along the way. But Google got tired of the same content being in multiple places so started to detect the “original” if you will, and only rank that one. So I’m interested in how Google feels about the fact that all these places are posting the exact same article for clicks/views/whatever.

I took an article and searched for the exact title in Google to see how many places it’s showed up. As of this post, there is 3500+ exact copies of this article floating around, probably all posted verbatim from the AP feed.

heraldapsyndication

As I guessed, the “original” article on AP is the top result in Google. Because of this, it makes me question what exactly is the point in re-posting the feed as is on the Herald. Although I will never know, I wonder how many people are actually reading these stories there, or whether they are just “bloat” designed to make it look like the Herald is always up to date with the latest news, even if it isn’t theirs.

If you are interested in the actual data. I’ve uploaded the Excel spreadsheet I used to pull the data here. As always, I love graphs/data comparisons in the comments!

UPDATE : Check out part 2 here!

Analyzing NZ Herald’s Sources

Equity Crowdfunding in New Zealand – How’s it going?

In September 2013, the Financial Markets Conduct Act 2013 was passed allowing companies to seek crowdfunding to raise capital. That is, use the “kickstarter” method, but giveaway equity instead of gifts/prizes/rewards etc. Since then, X companies have sprung up offering to a platform for equity crowdfunding. At first it was a bit of a rush through the door to find which types of companies suit crowdfunding. Understandably given this is New Zealand, beer crowdfunding seemed to be a massive hit but since then the market has slowed considerably and there has many “failures” by companies to raise capital. I thought a better way to look at how it’s gone over the past couple of years was to use my extremely poor visual statistics skills and draw some pretty graphs.

The platforms I used to gather these stats were….

Snowball Effect
Pledgeme
Equitise
Crowdcube
Liftoff
My Angel Investment

Ready. Set. Here we go!

Failed/Success

The first thing to work out is given all the crowdfunding floating around, how much of it actually met the target. And furthermore, how much met the “cap”. So even though you may be looking to raise $100k, you may allow more investors to jump on up to $200k until you say OK no more. The actual breakdown ends up something like this :

Failed : 15
Hit Target : 20
Hit Cap : 9

Or in visual terms.

FundingOutcome

That’s surprisingly good. If you think that only 34% of companies looking for crowdfunding fail, that’s a remarkably high hit rate when compared to other funding avenues.

Platforms

A few things to look at next is how each platform performs, and which ones seem to be doing the most funding. It’s a hard one to gauge because just from personal experience, sites like Snowball Effect while they may raise plenty of capital, they tend to be more “high end” companies that really could get funding from regular avenues, but have chosen crowdfunding. Pledgeme tends to have more early stage companies, with a few not-so great ones thrown in. We can see this when we look at offer caps of successful companies only (We do this because if someone goes and puts a ridiculous offer cap on a company then it can’t be helped by the platform).

AverageCapValue

In terms of total opportunities available, Pledgeme leads the way with 20 with Snowball Effect close behind.

TotalOpportunities

Or if piecharts are more your thing

TotalOpportunitiesMS

However when we look at the success rate of companies then PledgeMe falls a small way behind.

SuccessPercentage

Crowdcube looks to be number 1, but it’s only had 2 opportunities in it’s lifetime so far (Both successful). Equitise has had a few more at 6 with 4 of them successful. With those sorts of numbers it’s quite hard to gauge where you should go if you are looking to raise capital because there is so little data out there. But it’s all we have for now.

Cap vs Success

One final thing I wanted to take a look at is what sort of success different size companies are seeing with Crowdfunding. That is, when put into buckets based on their funding cap they are looking to raise, how many are successful and how many fall flat. Is there anything we can take away from companies that are maybe asking for too much?

CapVsSuccess

Chart is a little hard to read, but essentially it lumps companies together based on their cap they put up for funding. 0 – 500k, 500k – 1million etc. And realistically it doesn’t seem to make that much of a difference. We definitely get a good hit rate around that 500k to 1 million mark, but other than that we don’t see large scale fails quite yet.

Equity Crowdfunding in New Zealand – How’s it going?

Bulk Delete Local Git Branches

I usually write about dumb statistics that I’ve found with free data on the web, but today I got incredibly frustrated with a particular feature of Git that I thought I would change it up a bit. If you aren’t a developer, you can probably remove this post from your reading list, otherwise read on!

One of the most popular Git “branching strategies” right now in the programming world is the use of GitFlow. Basically a series of very small branches for features. Each branch lasting a day or so (Sometimes less). The usual process is that you create a branch to your work, push it to remote, create a pull request, and then create a new branch off development to start the next feature while you wait for a code review. Once the code review is completed you can merge the remote branch into development on Github all nice and easy. But what happens to your local branch? Most of the time it just sits there until you end up with this :

branches

Essentially hundreds of branches left stranded local machine with no way to get rid of them in one nice operation.

I searched around for a tool that would allow me to bulk select branches and then delete them all in one go rather than having to delete each branch one by one. I found command line scripts that would go through each branch one by one, and allow me to type “Y” to delete that branch. But still that seemed cumbersome. With a bit of spare time I create an extremely simple tool with a tree view, that allows you to delete a whole handful of branches all in one go. It’s a bit rough around the edges since I got it to do what I wanted to do then stopped, but it works!

GitBranchDeleter

I’ve uploaded the full source to Github here : https://github.com/mindingdata/GitBranchDeleter

As always, Pull requests welcome.

Bulk Delete Local Git Branches

Tracking Reddit Users From Political Subreddits (And Sort Of Failing)

It all started with a facebook message from a friend. And it all ended with me going “f- this” and calling it a day on the project.

The idea was simple. I go to three political subreddits on Reddit. SandersForPresident, HillaryClinton, and The_Donald, and I pull a mixture of random users from these subs. I then go and look at their comment history to see where else they are commenting. Hopefully, we could see some nice data on what types of people congregate in each.

What actually happened was that the Reddit API was intolerably slow, and had limits of 1 API call every 2 seconds. That severely limited my ability to pull in the data. To grab meaningful data would mean running it for a period of possibly days. This seemed OK at first, but (maybe wrongly of me) I just assumed that at some point during that time Reddit would go down and I would have to start from scratch.

Instead I ran the app across just 100 randomly picked users from each subreddit. Because of the small range of data I did manage to grab, it’s hard to draw any huge conclusions. So, instead I drew some pretty graphs and called it a day (Note, they aren’t that pretty, it’s all I could do with Excel, but you can download the data at the end of this post if you want to have a go yourself!)

Users Involved In Other Political Subs
So the first set of graphs is showing that given a random commentator in say “SandersForPresident” what are the chances that user also comments in say “The_Donald”. Either because they are just very politically minded, or because they want to cause a bit of mischief, let’s take a look.

UsersInOtherSubreddits

Hopefully the legend is explanatory. The first letter is where we found the original user (S = SandersForPresident) for example, the next letter is checking whether that user also commented in another subreddit. What we can see is that both Donald Trump and Hillary commentators also comment heavily in the Bernie Sanders subreddit. Draw from that what you will.

Hillary Supporters Camp Out /r/PoliticalDiscussion
While much of the data is spread out across hundreds of subreddits rather evenly, one thing that sticks out like a sore thumb is the fact that a large percentage of HillaryClinton commentators also comment in /r/PoliticalDiscussion. See the graph below.

UsersInPoliticalDiscussion

This is even more pronounced because if we look at something like /r/politics it’s much more evenly distributed.

UsersInPolitics

Hillary Supporters Also Have An Anti Sanders Subreddit
Unsurprisingly, the subreddit /r/enoughsandersspam is inhabited exclusively by HillaryClinton supports (There were zero instances of either Sanders or Trump supporters commenting there). No graph for this one since there isn’t much to show. But the numbers are that out of 100 randomly picked HillaryClinton commentators, 20 had commented on /r/enoughsandersspam.

Really, I could pull data out of this for days around what I think the data tells me. In all honesty I walked into this with a pretty empty mind, I didn’t have any agenda whatsoever. But the more I stare at the data the more I see that HillaryClinton commentators have really weird patterns around what they are commenting on. I think the easiest way to describe this is to leave you with a graph of how each of the political subreddit commentators comment on some fairly innocuous subreddits. These are subreddits that are popular in their own right and are not political in nature (Usually).

DefaultSubredditComparison

Hopefully the graph is big enough to see what I’m talking about. HillaryClinton commentators have MUCH less overall engagement with the rest of reddit. What does that mean? I’m really not too sure. Hypothesis in the comments are more than welcome 🙂

Small note about how I obtained the data for anyone that cares 🙂 I went to each political subreddit, and took the top 25 posts from the past month. Inside these posts, I went in and took 100 commentators, ordered by newest, but they had to have a score more than 1. I should have ended up with a little less than 2500 users (Give or take since we remove duplicates). I then shuffled all of these users and grabbed a random 100. From there, I went and grabbed their comment history ordered by newest. From their comments I grabbed all the subreddits they are commenting on and uniqued them all (So if they commented twice in /r/politics, that was still only one “point” for /r/politics). I then wrote out the resulting data to a CSV file which you can get below.

You can download the complete CSV data here! Link Back/Comment below if you use it so I can see what you made!

Tracking Reddit Users From Political Subreddits (And Sort Of Failing)

UFC Performance Of The Night Losers

Ultimate Fighting Championship (UFC) is an MMA organization that has been running since late 1993. Since that time, it’s gone through many changes both in rulesets, fighters, and payouts. In the early days, you had better believe that you do it for the love of it, rather than earning some massive pay day. Times have changed and fighters are now getting paid for putting their bodies on the line (Some are atleast….). One of the changes over the years is that the UFC introduced a “Knockout of the night” and “Submission of the night” bonuses. These were paid to the fighter on the night who had the most impressive KO or Submission. Recently it’s been changed to “Performance of the night” to reward a fighter for a stand out performance, usually these go to fighters who would have won the old “KO of the night” award, but not always e.g. If there is no KO win on the card at all, or if someone really fought out of their skin and put on a show.

The list of fighters who have won these awards is easy to find on the net. Infact, here is a handy Wikipedia page right here that lists the complete list. But it got me thinking, while this shows the fighters who have won the awards, what about those that have been on the receiving end of an absolute shellacking? I set to find out.

My method was simple. I used the Wikipedia API to pull the UFC Bonus list. From there, I went to each event page and checked who their opponent was, and saved them all into one big file (You can find this file at the end of the post if you wish to do your own stats!). It’s not really perfect for one big reason. Anyone who is on the receiving end of a complete ass kicking several times in a row is likely to be cut from the UFC and go fight elsewhere, but I thought the results would be interesting none the less.

Here are our winners (Or losers depending on which way you look at it), and it’s actually a 4 way tie!

Pat Barry – 5
UFC 161 against Shawn Jordan
UFC on Fox 3 against Lavar Johnson
UFC on Versus 6 against Stefan Struve
UFC on Versus 4 against Cheick Kongo
UFC 115 against Mirko Crocop

It’s probably not that surprising that Pat Barry tops our list if you are an MMA fan. Barry never played it safe and went in there to finish fights (at times at an extreme size difference…). He’s also on the receiving end of one of the most ridiculous comebacks in the history of MMA as seen below.

Matt Hughes – 5
UFC 65 against Georges St Pierre
UFC 79 against Goerges St Pierre
UFC 85 against Thiago Alves
UFC 123 against BJ Penn
UFC 135 against Josh Koscheck

I feel a little bad for Matt Hughes being in this list. The two losses against GSP are likely there because they were huge moments for Georges in his career (Winning the belt), rather than devastating wins. The losses against BJ Penn and Josh Koscheck were when Matt was really over the hill too.

Melvin Guillard – 5
UFC Fight Night 9 against Joe Stevenson
UFC Fight Night 19 against Nate Diaz
UFC on FX 1 against Jim Miller
UFC 136 against Joe Lauzon
UFC 150 against Donald Cerrone

Poor Melvin. Guillard has a bad habit of letting himself getting choked out (4 of the 5 are submission losses). Guillard has also been on the winning side of Performance of the Night bonuses 3 times, so it’s not all bad.

Sam Stout – 5
TUF 3 Finale against Kenny Florian
TUF Nations Finale against KJ Noons
UFC 161 against James Krause
UFC 185 against Ross Pearson
UFC Fight Night 74 against Frankie Perez

Sam Stout unfortunately makes the list right at the tail end of his career. His last 3 fights were all brutal losses (And if you extend it, 4 out of his last 5 fights are the ones above).

So that’s it! If you want to have a play around with the list yourself, I’ve uploaded the CSV file here and you can pull your own stats.

If you’re interested in a bit of the technical details. I’ve uploaded a Github C# Gist with the code as I wrote it. I had one eye on my other screen trying to finish off Narco’s (I can’t believe it’s taken me that long to watch this show…), so it’s not that clean. You will need to nuget the package “Linq2Wiki” and “HtmlAgilityPack” to really get it working, but it’s more just there if you want to be a bit nosey, not if you want to run it. It got a little messy towards the end as the HTML on wikipedia sometimes gets a bit hectic, and rather than going for an elegant solution this time around, I just wanted to finish the damn thing 🙂 I should also note that the program doesn’t manage to pull 100% of the data, I had to clean it up at the end manually. Mostly because of names that don’t always get spelled the same on Wikipedia (e.g. Georges St-Pierre or Georges St.Pierre). But it will get you 99% of the way there!

UFC Performance Of The Night Losers

Visualizing Auckland Public Transport

For some time now, I’ve been looking to do something with the transport data from Auckland Transport. They provide all bus routes and times via a set of CSV’s that are available for download on their website. It’s just been about finding time to really sit down and make something meaningful.

smallsimulation

I decided to go with a visualizing of Auckland traffic. That is, creating live maps that show in “realtime” how buses are moving across Auckland. Above is a sample of what I created using a live map in my browser, and moving pins around. I decided to try and be accurate with speed and time of when buses were moving. I didn’t get it 100% correct, but I got pretty close to it. Because it runs in the browser, it’s really hard to see all of Auckland at once. Too much animation essentially kills the browser in it’s tracks, but I’ve got a few working pretty swish!

Below is a set of live visualizations of how AT transport moves. If you’re just interested in the cool images, check them out! Below I’ve written a bit more how I build them. Caution, most of these are LARGE. I do not recommend opening them on mobile, especially not on mobile data. I should also note that on some browsers/computers, it does start lagging when there are a lot of buses on the move.

Dominion Road/Mt Eden Road/Sandringham Road

Waiheke Island

St Heliers

Now onto the more geeky stuff.

After downloading the data from Maxx, I had to join up all the files. The general gist was you have Routes, that do many trips (So bus number 335 may do 5 trips a day). On those trips, they will stop at a set amount of bus stops at the same time each day.

I decided early on I wanted to do something with a live map. The easiest way I found was using Leaflet with Animated Pins. This got me started, but there were a few things I had to do a bit of fiddling to get right.

I had to output a JSON object that could be read into javascript. Not too hard, from C# I could use JSON.net to serialize out a list of trips and their stops etc. But to do every bus trip in a day outputted a gigantic file that no browser would be able to load. In the end I had to specify latitude and longitude boundaries of what stops I wanted to include. Because of this, the visualizations above are centered around a particular area, but it would theoretically be possible to output all of Auckland, but your browser can’t handle it.

The speed of the animated pins was a big problem. I had to take each stop along the way and judge the distance between it and the stop before it using their latitude/longitude values. From there I could get the total distance traveled. I could then take the first and last stop, and work out how long overall it took. This gave me an average speed to use. Ideally, I could have worked out the distance and time between each bus stop, but the animated pins was a real pain to get working when you are trying to give individual speeds for each point on it’s journey.

Another issue was start times. Although it ended up being relatively easy to fix. With each trip, I output the total number of seconds that corresponds to that time of day. I then have a timeout that fires once a second on the webpage. Each time that timeout fires, I check whether any buses that haven’t started yet, should be starting (e.g. their start time is less than the current time in the simulation), if they have, I place the pin and start moving it.

Because I was hurriedly coding everything, it is still a bit of a mess and still kind of tailored to do what I wanted to do, but I’ve put the code up on Github for anyone that wants a play around. Repository is here : https://github.com/mindingdata/ATTransport. Again, it’s basically just a proof of concept so it’s not amazingly well coded, but feel free to take a look and build your own visualization.

Let me know what you think in the comments below!

Visualizing Auckland Public Transport

Analyzing New Zealand Politician’s Tweets

My last post took a look at USA presidential nominees tweets, and threw them into a word cloud to see if they were staying on message. More so, the whole post started with the assumption that American presidential candidates can spin any sort of question/answer into something about their own policy. The results were mixed. Democratic nominees tended to keep on message, while their Republican counter parts would rather slag off the democrats (e.g. The number one thing on every republicans mind seemed to be Hillary Clinton).

It got me thinking. How does New Zealand politicians fare? I think here in god zone we tend to think that politics are a lot more clean (Although… dirty politics anyone?), and so it’s doubtful that any minister would be sitting on Twitter constantly sending out attacks against the opposition. It’s not really election time (I think I’ll redo this in the runup to the election), so I don’t expect people to be heavy on policy, but let’s see if that holds true.

Same as last time. I took the last 200 tweets of the leaders of the various parties in New Zealand (Not including retweets or replies), and then removed common words (Such as “The” or “A”), and put them into a word cloud to give you a visual representation of their tweets. Here’s what we got.

John Key (National Party Leader/Prime Minister of New Zealand)
https://twitter.com/johnkeypm

johnkeytwittercloud

Key has some key policy areas within his Tweets. TPPA/Trade is talked about a lot as is Christchurch. Tourism (A portfolio he currently holds), also gets quite a few mentions. But the thing that interested me the most, was the fact that he spells Vietnam, “Viet Nam” as two words. Not sure if that’s correct or not.

Andrew Little (Labour Party Leader)
https://twitter.com/AndrewLittleMP

andrewlittletwittercloud

There isn’t really any policy in here. Lots of talk about rugby however. Typical of where Labour is at now, there is lots of talk about the “future” and “vision”. To me, the big question is where is the talk about TPP? It’s what’s on everyone’s mind right now and Little is avoiding it like the plague on his Twitter.

James Shaw (Green Party Co-Leader)
https://twitter.com/AndrewLittleMP

jamesshawtwittercloud

If you didn’t know what James Shaw stood for before, you do now. It’s the most common word on his Twitter, “Climate”. The massive “Paris” word there may not make sense at first, but it’s in reference to the Paris Agreement (A UN convention on climate change). Shaw is definitely on message on his Twitter.

Metiria Turei (Green Party Co-Leader)
https://twitter.com/metiria

tureitwittercloud

Again, similar to James Shaw, lots of talk about Climate Change which is what the Greens are all about. Poverty and families make plenty of appearances which is something that Turei has really been campaigning on for some time. Overall, pretty good at staying on message.

Winston Peters (NZ First Party Leader)
https://twitter.com/winstonpeters

winstonpeterstwittercloud

Winston takes his representation of Northland seriously. It’s almost all he talks about. He’s also tweeting about the flag debate, and the TPPA. Plenty of talk/attacks/tweets against National, which is what we probably have come to expect from Peters.

David Seymour (Act Party Leader)
https://twitter.com/dbseymour

davidseymourtwittercloud

David is the MP for Epsom, so it’s good he is talking about it a lot. Other than that, we have talk about Tax, Dying (Assisted Dying) and the TPP. I like the fact that “choice” is also very prominent on David’s twitter. Even though I may not agree with many of Act’s policies (Read : any of them), they always campaign on libertarian values of “choice”.

So let’s wrap up.

It’s interesting because I’m not sure what to make of these results so far. Between Turei and Peters, they talk about National an awful lot. Reading their twitter streams, it’s definitely not as vicious as american politics, but all the same, they are still spending their time tweeting out something against National, rather than something of their own. But, then again, that is the role of the opposition, to hold the government of the time to account.

I think I was most surprised about Andrew Little’s Twitter. Very little policy going up on there. He could be going for that “everyday bloke” type vibe where he isn’t pushing policy, he’s pushing himself that he’s your mate to have a beer with. I can’t blame him, Labour have arguably had more “policy” or “promises” (right or wrong), than National in the previous elections, but have still lost.

I have a feeling it’s not a great comparison to the American Presidential Candidate’s tweets, because over there, it’s the runup to the election. Here, it’s all opening schools and photo ops for a while. In the runup to the next general election, I’ll redo this post and see how things change.

Analyzing New Zealand Politician’s Tweets

Tweets of Presidential Nominees

Sitting from afar in New Zealand, I only get news of the USA presidential elections in dribs and drabs. I see highlights of the various debates, and I get news articles (mostly pro-Bernie Sanders) blasted in my face on Reddit. The thing that most intrigued me watching clips of the debates, was how well every candidate can spin essentially any question into a WWE style promo for their policies. Sanders can spin almost any question into a rant about the working class, Trump can take any problem in the world and blame it on an ethnicity other than White American and Clinton can take a question on her email scandal and somehow spin it to pander to women voters.

It got me thinking about the candidates Twitter accounts, do they stay just as well on message? Is Sanders all about health care? Is Hillary Clinton all about trying to get the women’s vote? Is Trump tweeting non stop about immigration? I set out to find out.

I wrote a quick app to download the candidates last 200 tweets, remove stop words (words like “and” or “the”) and created a word cloud for each candidate. Here are the results….

Bernie Sanders (D) :

berniesanderstwittercloud

Yep. You bet your bananas that health care is talked about often. But what also gets me about Bernie’s Twitter (and as you will see below), is that he is very straight forward in his offerings compared to other candidates. He talks Jobs, Wages, Social, People, Climate, Social Security, Minimum Wage. It’s all here. You’ll notice that there are Spanish words in Bernie’s cloud, and that’s because he tweets in Spanish quite often.

Hillary Clinton (D) :

hillaryclintontwittercloud

Hillary is also all about that healthcare. She also spends a lot of time talking about republicans, Trump and Obama. Again, she also sends out tweets in Spanish.

Martin O’Malley (D) :

martinomalleytwittercloud

O’Malley talks a lot about guns, refuges, energy, and apparently leadership. He is a sure fire chance to drop out of the race after a few primary votes, so I didn’t expect to see too much policy.

Donald Trump (R) :

donaldtrumptwittercloud

Whatever the Don is tweeting about, it ain’t policy that’s for sure. He spends a lot of time talking about Ted Cruz though. You can also see he talks about Jeb Bush, Rubio and Clinton an awful lot. I thought for sure the number one term would be something to do with immigration, but it seems Trump saves that for the debates.

Ted Cruz (R) :

tedcruztwittercloud

Cruz is all over the place. Mostly he uses Twitter to thank people. He does talk Tax, Isis and women in terms of policy. But that’s about it.

Marco Rubio (R) :

marcorubiotwittercloud

Rubio won’t stop tweeting about Hillary. In terms of policy, he talks about Isis and Iran a lot, and football too. Can’t forget about football!

So let’s wrap up. Overall it was an interesting exercise. It seems that the Democrats do talk a lot more policy on Twitter than Republicans. But I would point out that’s not a bad thing necessarily. If the Twitter account is just pumping out policy slogans all day long, it’s not good either.

I might in the future take a look at other things to do with presidential Twitter accounts. e.g. What was Clinton tweeting in the 2008 race against Obama? What did Trump tweet out before he entered the race?

Tweets of Presidential Nominees