Analyzing NZ Herald’s Sources

For those outside of NZ, this post is about NZ’s largest national newspaper, “The New Zealand Herald”. If you don’t live in NZ you might not find it that interesting, but it’s still a good look into how journalism within NZ is slowly being shut down and replaced with clickbait type stories and syndicated content.

Over the past month there has been a couple of articles floating around the web, most notably this one by Russel Brown, and another piece by David Farrar. They talk about how the NZ Herald’s online edition seems to be filled up more and more by “Daily Mail” type news. Usually those stories that have a headline with “…. And what happened next will amaze you!” or “See what made *B Grade Celebrity here* cry”. On top of that I’ve began to notice that many stories listed online at the Herald are simply scraped/copy pasted articles from Associate Press or another online newspaper. Essentially making our national newspaper syndicated garbage.

At the bottom of every article you can usually find the “source” of the article. It looks a bit like this :


It got me thinking. Because every article has a tag on where it came from, it should be easy to do a quick scrape of the website and tell us just how much of the Herald’s content is actually theirs, and how much is syndicated. I did think about doing a massive crawl all over the website, but it seemed easier just to pick all the front page stories and check them. So I quickly whipped up a tool to do just that. And the results will shock you! (har har)

After running my app I ended up with a set of totals that looked like this. Note that (blank) means there was no syndication marker. I believe these are from the Herald (Possibly online only stories).

Source Count
NZ Herald 62
Associated Press 36
Daily Mail 8 6
Bay Of Plenty Times 4
Northern Advocate 4
(blank) 4
Canvas 2
Daily Telegraph UK 2
Hawkes Bay Today 2
Washington Post 2
Herald On Sunday 1
Christchurch Star 1
The Country 1
Wanganui Chronicle 1

When we actually group “types” of sources together. We end up with something like this.

Source Count
Herald Sources 69
Local Sources 10
Other Sources 54

So we can see that the Herald itself only makes up half of it’s own content, the rest comes from either local sources or from “Other sources” such as the Daily Mail, the AP feed, or other overseas partners.

What is clear is that the Herald loves using associated press. I could be wrong, but the entire latest news section on the herald is a straight feed from AP with no editorial done on it what so ever. So all of these stories are un-edited straight syndication.


It’s kinda interesting to me because a while back, “autoblogs” used to be this big thing. Where you set up a blog and simply have it publishing 10 different feeds, not even editing the articles along the way. But Google got tired of the same content being in multiple places so started to detect the “original” if you will, and only rank that one. So I’m interested in how Google feels about the fact that all these places are posting the exact same article for clicks/views/whatever.

I took an article and searched for the exact title in Google to see how many places it’s showed up. As of this post, there is 3500+ exact copies of this article floating around, probably all posted verbatim from the AP feed.


As I guessed, the “original” article on AP is the top result in Google. Because of this, it makes me question what exactly is the point in re-posting the feed as is on the Herald. Although I will never know, I wonder how many people are actually reading these stories there, or whether they are just “bloat” designed to make it look like the Herald is always up to date with the latest news, even if it isn’t theirs.

If you are interested in the actual data. I’ve uploaded the Excel spreadsheet I used to pull the data here. As always, I love graphs/data comparisons in the comments!

Analyzing NZ Herald’s Sources

Equity Crowdfunding in New Zealand – How’s it going?

In September 2013, the Financial Markets Conduct Act 2013 was passed allowing companies to seek crowdfunding to raise capital. That is, use the “kickstarter” method, but giveaway equity instead of gifts/prizes/rewards etc. Since then, X companies have sprung up offering to a platform for equity crowdfunding. At first it was a bit of a rush through the door to find which types of companies suit crowdfunding. Understandably given this is New Zealand, beer crowdfunding seemed to be a massive hit but since then the market has slowed considerably and there has many “failures” by companies to raise capital. I thought a better way to look at how it’s gone over the past couple of years was to use my extremely poor visual statistics skills and draw some pretty graphs.

The platforms I used to gather these stats were….

Snowball Effect
My Angel Investment

Ready. Set. Here we go!


The first thing to work out is given all the crowdfunding floating around, how much of it actually met the target. And furthermore, how much met the “cap”. So even though you may be looking to raise $100k, you may allow more investors to jump on up to $200k until you say OK no more. The actual breakdown ends up something like this :

Failed : 15
Hit Target : 20
Hit Cap : 9

Or in visual terms.


That’s surprisingly good. If you think that only 34% of companies looking for crowdfunding fail, that’s a remarkably high hit rate when compared to other funding avenues.


A few things to look at next is how each platform performs, and which ones seem to be doing the most funding. It’s a hard one to gauge because just from personal experience, sites like Snowball Effect while they may raise plenty of capital, they tend to be more “high end” companies that really could get funding from regular avenues, but have chosen crowdfunding. Pledgeme tends to have more early stage companies, with a few not-so great ones thrown in. We can see this when we look at offer caps of successful companies only (We do this because if someone goes and puts a ridiculous offer cap on a company then it can’t be helped by the platform).


In terms of total opportunities available, Pledgeme leads the way with 20 with Snowball Effect close behind.


Or if piecharts are more your thing


However when we look at the success rate of companies then PledgeMe falls a small way behind.


Crowdcube looks to be number 1, but it’s only had 2 opportunities in it’s lifetime so far (Both successful). Equitise has had a few more at 6 with 4 of them successful. With those sorts of numbers it’s quite hard to gauge where you should go if you are looking to raise capital because there is so little data out there. But it’s all we have for now.

Cap vs Success

One final thing I wanted to take a look at is what sort of success different size companies are seeing with Crowdfunding. That is, when put into buckets based on their funding cap they are looking to raise, how many are successful and how many fall flat. Is there anything we can take away from companies that are maybe asking for too much?


Chart is a little hard to read, but essentially it lumps companies together based on their cap they put up for funding. 0 – 500k, 500k – 1million etc. And realistically it doesn’t seem to make that much of a difference. We definitely get a good hit rate around that 500k to 1 million mark, but other than that we don’t see large scale fails quite yet.

Equity Crowdfunding in New Zealand – How’s it going?

Bulk Delete Local Git Branches

I usually write about dumb statistics that I’ve found with free data on the web, but today I got incredibly frustrated with a particular feature of Git that I thought I would change it up a bit. If you aren’t a developer, you can probably remove this post from your reading list, otherwise read on!

One of the most popular Git “branching strategies” right now in the programming world is the use of GitFlow. Basically a series of very small branches for features. Each branch lasting a day or so (Sometimes less). The usual process is that you create a branch to your work, push it to remote, create a pull request, and then create a new branch off development to start the next feature while you wait for a code review. Once the code review is completed you can merge the remote branch into development on Github all nice and easy. But what happens to your local branch? Most of the time it just sits there until you end up with this :


Essentially hundreds of branches left stranded local machine with no way to get rid of them in one nice operation.

I searched around for a tool that would allow me to bulk select branches and then delete them all in one go rather than having to delete each branch one by one. I found command line scripts that would go through each branch one by one, and allow me to type “Y” to delete that branch. But still that seemed cumbersome. With a bit of spare time I create an extremely simple tool with a tree view, that allows you to delete a whole handful of branches all in one go. It’s a bit rough around the edges since I got it to do what I wanted to do then stopped, but it works!


I’ve uploaded the full source to Github here :

As always, Pull requests welcome.

Bulk Delete Local Git Branches

Tracking Reddit Users From Political Subreddits (And Sort Of Failing)

It all started with a facebook message from a friend. And it all ended with me going “f- this” and calling it a day on the project.

The idea was simple. I go to three political subreddits on Reddit. SandersForPresident, HillaryClinton, and The_Donald, and I pull a mixture of random users from these subs. I then go and look at their comment history to see where else they are commenting. Hopefully, we could see some nice data on what types of people congregate in each.

What actually happened was that the Reddit API was intolerably slow, and had limits of 1 API call every 2 seconds. That severely limited my ability to pull in the data. To grab meaningful data would mean running it for a period of possibly days. This seemed OK at first, but (maybe wrongly of me) I just assumed that at some point during that time Reddit would go down and I would have to start from scratch.

Instead I ran the app across just 100 randomly picked users from each subreddit. Because of the small range of data I did manage to grab, it’s hard to draw any huge conclusions. So, instead I drew some pretty graphs and called it a day (Note, they aren’t that pretty, it’s all I could do with Excel, but you can download the data at the end of this post if you want to have a go yourself!)

Users Involved In Other Political Subs
So the first set of graphs is showing that given a random commentator in say “SandersForPresident” what are the chances that user also comments in say “The_Donald”. Either because they are just very politically minded, or because they want to cause a bit of mischief, let’s take a look.


Hopefully the legend is explanatory. The first letter is where we found the original user (S = SandersForPresident) for example, the next letter is checking whether that user also commented in another subreddit. What we can see is that both Donald Trump and Hillary commentators also comment heavily in the Bernie Sanders subreddit. Draw from that what you will.

Hillary Supporters Camp Out /r/PoliticalDiscussion
While much of the data is spread out across hundreds of subreddits rather evenly, one thing that sticks out like a sore thumb is the fact that a large percentage of HillaryClinton commentators also comment in /r/PoliticalDiscussion. See the graph below.


This is even more pronounced because if we look at something like /r/politics it’s much more evenly distributed.


Hillary Supporters Also Have An Anti Sanders Subreddit
Unsurprisingly, the subreddit /r/enoughsandersspam is inhabited exclusively by HillaryClinton supports (There were zero instances of either Sanders or Trump supporters commenting there). No graph for this one since there isn’t much to show. But the numbers are that out of 100 randomly picked HillaryClinton commentators, 20 had commented on /r/enoughsandersspam.

Really, I could pull data out of this for days around what I think the data tells me. In all honesty I walked into this with a pretty empty mind, I didn’t have any agenda whatsoever. But the more I stare at the data the more I see that HillaryClinton commentators have really weird patterns around what they are commenting on. I think the easiest way to describe this is to leave you with a graph of how each of the political subreddit commentators comment on some fairly innocuous subreddits. These are subreddits that are popular in their own right and are not political in nature (Usually).


Hopefully the graph is big enough to see what I’m talking about. HillaryClinton commentators have MUCH less overall engagement with the rest of reddit. What does that mean? I’m really not too sure. Hypothesis in the comments are more than welcome 🙂

Small note about how I obtained the data for anyone that cares 🙂 I went to each political subreddit, and took the top 25 posts from the past month. Inside these posts, I went in and took 100 commentators, ordered by newest, but they had to have a score more than 1. I should have ended up with a little less than 2500 users (Give or take since we remove duplicates). I then shuffled all of these users and grabbed a random 100. From there, I went and grabbed their comment history ordered by newest. From their comments I grabbed all the subreddits they are commenting on and uniqued them all (So if they commented twice in /r/politics, that was still only one “point” for /r/politics). I then wrote out the resulting data to a CSV file which you can get below.

You can download the complete CSV data here! Link Back/Comment below if you use it so I can see what you made!

Tracking Reddit Users From Political Subreddits (And Sort Of Failing)

UFC Performance Of The Night Losers

Ultimate Fighting Championship (UFC) is an MMA organization that has been running since late 1993. Since that time, it’s gone through many changes both in rulesets, fighters, and payouts. In the early days, you had better believe that you do it for the love of it, rather than earning some massive pay day. Times have changed and fighters are now getting paid for putting their bodies on the line (Some are atleast….). One of the changes over the years is that the UFC introduced a “Knockout of the night” and “Submission of the night” bonuses. These were paid to the fighter on the night who had the most impressive KO or Submission. Recently it’s been changed to “Performance of the night” to reward a fighter for a stand out performance, usually these go to fighters who would have won the old “KO of the night” award, but not always e.g. If there is no KO win on the card at all, or if someone really fought out of their skin and put on a show.

The list of fighters who have won these awards is easy to find on the net. Infact, here is a handy Wikipedia page right here that lists the complete list. But it got me thinking, while this shows the fighters who have won the awards, what about those that have been on the receiving end of an absolute shellacking? I set to find out.

My method was simple. I used the Wikipedia API to pull the UFC Bonus list. From there, I went to each event page and checked who their opponent was, and saved them all into one big file (You can find this file at the end of the post if you wish to do your own stats!). It’s not really perfect for one big reason. Anyone who is on the receiving end of a complete ass kicking several times in a row is likely to be cut from the UFC and go fight elsewhere, but I thought the results would be interesting none the less.

Here are our winners (Or losers depending on which way you look at it), and it’s actually a 4 way tie!

Pat Barry – 5
UFC 161 against Shawn Jordan
UFC on Fox 3 against Lavar Johnson
UFC on Versus 6 against Stefan Struve
UFC on Versus 4 against Cheick Kongo
UFC 115 against Mirko Crocop

It’s probably not that surprising that Pat Barry tops our list if you are an MMA fan. Barry never played it safe and went in there to finish fights (at times at an extreme size difference…). He’s also on the receiving end of one of the most ridiculous comebacks in the history of MMA as seen below.

Matt Hughes – 5
UFC 65 against Georges St Pierre
UFC 79 against Goerges St Pierre
UFC 85 against Thiago Alves
UFC 123 against BJ Penn
UFC 135 against Josh Koscheck

I feel a little bad for Matt Hughes being in this list. The two losses against GSP are likely there because they were huge moments for Georges in his career (Winning the belt), rather than devastating wins. The losses against BJ Penn and Josh Koscheck were when Matt was really over the hill too.

Melvin Guillard – 5
UFC Fight Night 9 against Joe Stevenson
UFC Fight Night 19 against Nate Diaz
UFC on FX 1 against Jim Miller
UFC 136 against Joe Lauzon
UFC 150 against Donald Cerrone

Poor Melvin. Guillard has a bad habit of letting himself getting choked out (4 of the 5 are submission losses). Guillard has also been on the winning side of Performance of the Night bonuses 3 times, so it’s not all bad.

Sam Stout – 5
TUF 3 Finale against Kenny Florian
TUF Nations Finale against KJ Noons
UFC 161 against James Krause
UFC 185 against Ross Pearson
UFC Fight Night 74 against Frankie Perez

Sam Stout unfortunately makes the list right at the tail end of his career. His last 3 fights were all brutal losses (And if you extend it, 4 out of his last 5 fights are the ones above).

So that’s it! If you want to have a play around with the list yourself, I’ve uploaded the CSV file here and you can pull your own stats.

If you’re interested in a bit of the technical details. I’ve uploaded a Github C# Gist with the code as I wrote it. I had one eye on my other screen trying to finish off Narco’s (I can’t believe it’s taken me that long to watch this show…), so it’s not that clean. You will need to nuget the package “Linq2Wiki” and “HtmlAgilityPack” to really get it working, but it’s more just there if you want to be a bit nosey, not if you want to run it. It got a little messy towards the end as the HTML on wikipedia sometimes gets a bit hectic, and rather than going for an elegant solution this time around, I just wanted to finish the damn thing 🙂 I should also note that the program doesn’t manage to pull 100% of the data, I had to clean it up at the end manually. Mostly because of names that don’t always get spelled the same on Wikipedia (e.g. Georges St-Pierre or Georges St.Pierre). But it will get you 99% of the way there!

UFC Performance Of The Night Losers

Visualizing Auckland Public Transport

For some time now, I’ve been looking to do something with the transport data from Auckland Transport. They provide all bus routes and times via a set of CSV’s that are available for download on their website. It’s just been about finding time to really sit down and make something meaningful.


I decided to go with a visualizing of Auckland traffic. That is, creating live maps that show in “realtime” how buses are moving across Auckland. Above is a sample of what I created using a live map in my browser, and moving pins around. I decided to try and be accurate with speed and time of when buses were moving. I didn’t get it 100% correct, but I got pretty close to it. Because it runs in the browser, it’s really hard to see all of Auckland at once. Too much animation essentially kills the browser in it’s tracks, but I’ve got a few working pretty swish!

Below is a set of live visualizations of how AT transport moves. If you’re just interested in the cool images, check them out! Below I’ve written a bit more how I build them. Caution, most of these are LARGE. I do not recommend opening them on mobile, especially not on mobile data. I should also note that on some browsers/computers, it does start lagging when there are a lot of buses on the move.

Dominion Road/Mt Eden Road/Sandringham Road

Waiheke Island

St Heliers

Now onto the more geeky stuff.

After downloading the data from Maxx, I had to join up all the files. The general gist was you have Routes, that do many trips (So bus number 335 may do 5 trips a day). On those trips, they will stop at a set amount of bus stops at the same time each day.

I decided early on I wanted to do something with a live map. The easiest way I found was using Leaflet with Animated Pins. This got me started, but there were a few things I had to do a bit of fiddling to get right.

I had to output a JSON object that could be read into javascript. Not too hard, from C# I could use to serialize out a list of trips and their stops etc. But to do every bus trip in a day outputted a gigantic file that no browser would be able to load. In the end I had to specify latitude and longitude boundaries of what stops I wanted to include. Because of this, the visualizations above are centered around a particular area, but it would theoretically be possible to output all of Auckland, but your browser can’t handle it.

The speed of the animated pins was a big problem. I had to take each stop along the way and judge the distance between it and the stop before it using their latitude/longitude values. From there I could get the total distance traveled. I could then take the first and last stop, and work out how long overall it took. This gave me an average speed to use. Ideally, I could have worked out the distance and time between each bus stop, but the animated pins was a real pain to get working when you are trying to give individual speeds for each point on it’s journey.

Another issue was start times. Although it ended up being relatively easy to fix. With each trip, I output the total number of seconds that corresponds to that time of day. I then have a timeout that fires once a second on the webpage. Each time that timeout fires, I check whether any buses that haven’t started yet, should be starting (e.g. their start time is less than the current time in the simulation), if they have, I place the pin and start moving it.

Because I was hurriedly coding everything, it is still a bit of a mess and still kind of tailored to do what I wanted to do, but I’ve put the code up on Github for anyone that wants a play around. Repository is here : Again, it’s basically just a proof of concept so it’s not amazingly well coded, but feel free to take a look and build your own visualization.

Let me know what you think in the comments below!

Visualizing Auckland Public Transport

Analyzing New Zealand Politician’s Tweets

My last post took a look at USA presidential nominees tweets, and threw them into a word cloud to see if they were staying on message. More so, the whole post started with the assumption that American presidential candidates can spin any sort of question/answer into something about their own policy. The results were mixed. Democratic nominees tended to keep on message, while their Republican counter parts would rather slag off the democrats (e.g. The number one thing on every republicans mind seemed to be Hillary Clinton).

It got me thinking. How does New Zealand politicians fare? I think here in god zone we tend to think that politics are a lot more clean (Although… dirty politics anyone?), and so it’s doubtful that any minister would be sitting on Twitter constantly sending out attacks against the opposition. It’s not really election time (I think I’ll redo this in the runup to the election), so I don’t expect people to be heavy on policy, but let’s see if that holds true.

Same as last time. I took the last 200 tweets of the leaders of the various parties in New Zealand (Not including retweets or replies), and then removed common words (Such as “The” or “A”), and put them into a word cloud to give you a visual representation of their tweets. Here’s what we got.

John Key (National Party Leader/Prime Minister of New Zealand)


Key has some key policy areas within his Tweets. TPPA/Trade is talked about a lot as is Christchurch. Tourism (A portfolio he currently holds), also gets quite a few mentions. But the thing that interested me the most, was the fact that he spells Vietnam, “Viet Nam” as two words. Not sure if that’s correct or not.

Andrew Little (Labour Party Leader)


There isn’t really any policy in here. Lots of talk about rugby however. Typical of where Labour is at now, there is lots of talk about the “future” and “vision”. To me, the big question is where is the talk about TPP? It’s what’s on everyone’s mind right now and Little is avoiding it like the plague on his Twitter.

James Shaw (Green Party Co-Leader)


If you didn’t know what James Shaw stood for before, you do now. It’s the most common word on his Twitter, “Climate”. The massive “Paris” word there may not make sense at first, but it’s in reference to the Paris Agreement (A UN convention on climate change). Shaw is definitely on message on his Twitter.

Metiria Turei (Green Party Co-Leader)


Again, similar to James Shaw, lots of talk about Climate Change which is what the Greens are all about. Poverty and families make plenty of appearances which is something that Turei has really been campaigning on for some time. Overall, pretty good at staying on message.

Winston Peters (NZ First Party Leader)


Winston takes his representation of Northland seriously. It’s almost all he talks about. He’s also tweeting about the flag debate, and the TPPA. Plenty of talk/attacks/tweets against National, which is what we probably have come to expect from Peters.

David Seymour (Act Party Leader)


David is the MP for Epsom, so it’s good he is talking about it a lot. Other than that, we have talk about Tax, Dying (Assisted Dying) and the TPP. I like the fact that “choice” is also very prominent on David’s twitter. Even though I may not agree with many of Act’s policies (Read : any of them), they always campaign on libertarian values of “choice”.

So let’s wrap up.

It’s interesting because I’m not sure what to make of these results so far. Between Turei and Peters, they talk about National an awful lot. Reading their twitter streams, it’s definitely not as vicious as american politics, but all the same, they are still spending their time tweeting out something against National, rather than something of their own. But, then again, that is the role of the opposition, to hold the government of the time to account.

I think I was most surprised about Andrew Little’s Twitter. Very little policy going up on there. He could be going for that “everyday bloke” type vibe where he isn’t pushing policy, he’s pushing himself that he’s your mate to have a beer with. I can’t blame him, Labour have arguably had more “policy” or “promises” (right or wrong), than National in the previous elections, but have still lost.

I have a feeling it’s not a great comparison to the American Presidential Candidate’s tweets, because over there, it’s the runup to the election. Here, it’s all opening schools and photo ops for a while. In the runup to the next general election, I’ll redo this post and see how things change.

Analyzing New Zealand Politician’s Tweets

Tweets of Presidential Nominees

Sitting from afar in New Zealand, I only get news of the USA presidential elections in dribs and drabs. I see highlights of the various debates, and I get news articles (mostly pro-Bernie Sanders) blasted in my face on Reddit. The thing that most intrigued me watching clips of the debates, was how well every candidate can spin essentially any question into a WWE style promo for their policies. Sanders can spin almost any question into a rant about the working class, Trump can take any problem in the world and blame it on an ethnicity other than White American and Clinton can take a question on her email scandal and somehow spin it to pander to women voters.

It got me thinking about the candidates Twitter accounts, do they stay just as well on message? Is Sanders all about health care? Is Hillary Clinton all about trying to get the women’s vote? Is Trump tweeting non stop about immigration? I set out to find out.

I wrote a quick app to download the candidates last 200 tweets, remove stop words (words like “and” or “the”) and created a word cloud for each candidate. Here are the results….

Bernie Sanders (D) :


Yep. You bet your bananas that health care is talked about often. But what also gets me about Bernie’s Twitter (and as you will see below), is that he is very straight forward in his offerings compared to other candidates. He talks Jobs, Wages, Social, People, Climate, Social Security, Minimum Wage. It’s all here. You’ll notice that there are Spanish words in Bernie’s cloud, and that’s because he tweets in Spanish quite often.

Hillary Clinton (D) :


Hillary is also all about that healthcare. She also spends a lot of time talking about republicans, Trump and Obama. Again, she also sends out tweets in Spanish.

Martin O’Malley (D) :


O’Malley talks a lot about guns, refuges, energy, and apparently leadership. He is a sure fire chance to drop out of the race after a few primary votes, so I didn’t expect to see too much policy.

Donald Trump (R) :


Whatever the Don is tweeting about, it ain’t policy that’s for sure. He spends a lot of time talking about Ted Cruz though. You can also see he talks about Jeb Bush, Rubio and Clinton an awful lot. I thought for sure the number one term would be something to do with immigration, but it seems Trump saves that for the debates.

Ted Cruz (R) :


Cruz is all over the place. Mostly he uses Twitter to thank people. He does talk Tax, Isis and women in terms of policy. But that’s about it.

Marco Rubio (R) :


Rubio won’t stop tweeting about Hillary. In terms of policy, he talks about Isis and Iran a lot, and football too. Can’t forget about football!

So let’s wrap up. Overall it was an interesting exercise. It seems that the Democrats do talk a lot more policy on Twitter than Republicans. But I would point out that’s not a bad thing necessarily. If the Twitter account is just pumping out policy slogans all day long, it’s not good either.

I might in the future take a look at other things to do with presidential Twitter accounts. e.g. What was Clinton tweeting in the 2008 race against Obama? What did Trump tweet out before he entered the race?

Tweets of Presidential Nominees

Analyzing The Rock Playlist

In New Zealand, there is a radio station name “The Rock“, that vows to never play the same song twice in a single day (Between 9 – 5). They call it “The Rock No Repeat Workday”. Snappy.  It sometimes changes into a competition where they will intentionally play the same song twice in a day and you can call up to win a prize etc.

One of the main criticisms of The Rock, is that even if it doesn’t play the same song between 9 – 5, it still plays the same song everyday, often at the same time. To be fair to them, it’s probably no different to the criticism hurled at any popular radio station really. Anecdotal, I used to listen to the radio as I was getting up in the morning, and I used to swear that for weeks on end, I would be getting up to the same song.

Rather than live with the idea in my head that they “may” be playing the same songs. I sought to see if they really were. What a brilliant use of my free time I thought (/s)! But really, I had some time to kill as work was settling down for the xmas holidays, so let’s do this!

I had this crazy idea that I could stream the radio to my computer, and run some sort of Shazam type API to work out what song they were playing. But as it turned out, it’s much easier than that. The radio station keeps a page updated, that lists songs they have played that day : . It’s not completely in real time, but it’s close enough that each day I can download the songs and store them somewhere for later analysis.

Weirdly enough, when I started trying to rip the contents of the page. I noticed that hidden in the page, was the ENTIRE setlist for that day, not just what it showed on the page. Even songs that were played at 2am were actually in the source code, and then they ran this amazing piece of code to decide whether to show it or not.

What that’s basically saying is, if today isn’t a Saturday or a Sunday, and it’s between 9 and 6, then you can show the song. Otherwise, even though the song is written to the webpage anyway, just don’t show it to the user.  If you ever wondered why The Rock website is so f-ing slow, this is the reason. The webpage is over 4000+ lines of code long, but most of it is just repeated junk just to output 20 songs on a page. Crazy stuff.

Anyway, I whipped up a quick app that would start at 11:45PM every night on my computer and hit up the page to download the list of songs. It would then save them down into a CSV, and that would be it. Unfortunately, it seems like TheRock removes some entries every now and again, so I didn’t always get complete days. But still, I recorded 1300 songs played by the radio station.

Awards time!

The award for most replayed song goes to… Mountain At My Gates – Foals. Played 18 times so 1.3% of all songs played.

The award for most played band goes to… Red Hot Chili Peppers. Played 34 times so 2.6% of all songs played. (They were closely followed by Foo Fighters with 33 plays).

And some other numbers that I found rather interesting. In total, there was 610 unique songs in the list. And 242 total unique bands. I would say the total is probably somewhat less than that, because I didn’t bother cleaning the list of “feat” artists.

If you want the CSV file to download for yourself, you can grab it here. Let me know what other cool numbers you come up with in the comments! Initially I wanted to work out if the same band was played roughly the same time per day, (e.g. Foo Fighters in the morning, Pearl Jam for your evening), but my Excel Fu is really not that great.

For those that may be interested (Not many), here is the very simple C# code I wrote to download the list.

Analyzing The Rock Playlist

Blocking Google Analytics Referral Spam

Let’s start with the shitlist so far…

And the list goes on. If you have seen any of these in your Google Analytics referral list, you are the subject of GA referral spam. It’s been going on for years, but only recently has it reached a point where the bulk of your traffic showing up in reports is complete junk.

So how does it work? Many initially believe that it was bots actually visiting your site, and triggering GA to record a visitor. This is not the case however, these bots never actually come to your site at all. You see, on your site you will have a GA code that looks similar to this “UA-12345678-1”. All these spammers do is roll through each GA code by increasing the userid by one each time, and then spamming a few hits to each.

So how can you stop them if they aren’t actually coming to your site? Well really Google should step up to the plate and do something about it since really, it’s making their service completely useless. But support of existing products is not a Google way of thinking. Luckily there is a simple way in GA to block them, and albeit it’s manual and it only blocks them going forward. It’s better than nothing.

1. Once logged into GA, at the top of the window should be an Admin tab. Click it!


2. On the far left column, select your account from the top down. Then select “All Filters”.


3. Click the “Add Filter” button. Then fill it out similar to below where the “Filter Pattern” field below is each of the domains you wish to block, separated by a | character, and with a \ character before every full stop.


4. Further down the page you should “select” the views you wish to apply this to. If you have no idea what this means, you should only see one named “All Website Data”, and you want to select this one.

Wallah! You should now be spam free!

So you’re probably saying to yourself that it seems like a lot of work to have to manually join up all these domains, and put a slash infront of the full stops etc. It’s madness you say! Well, I’ve created a Github repo of all the spam domains I’ve come across, and a small tool to join them all together. I hope that in the future, more people will join in so we have a more collaborative list of blocking spam domains. You do need Node installed to run the tool, but hopefully in the future I’ll get around to making it a bit easier for those non technical people.

You can check out the Github repo here :

If you are unsure what the heck you are doing on Github, then the easier way is to just grab the output list here : . Take the second line in that file, and slam it into the block list in GA. And wallah, you have a pretty dang comprehensive spam list. You should check that file often to see if anything new has been added, and just take the entire line again. It makes it a lot more easier than having to maintain your own list!

And of course, for those interested in helping, please create a pull request to add any spam domains to the list so others can benefit from it. The general thought is that if you haven’t been hit my a particular spam domain yet, you still may be in the future, so it’s better to max out your spam list while you can.

Blocking Google Analytics Referral Spam