Last week I published a post talking about how I had done a quick investigation into what percentage of stories on NZHerald came from syndicated sources. I threw the post up on Twitter and Reddit and had some interesting feedback. While I had only intended to take a quick look at the homepage and then move on, there were some great points made on social media about ways to improve the data. It really boiled down to two main points.
Had I used enough data? While my original intention was to just do a quick scrape, people felt that investigating 100 odd stories wasn’t enough.
And the second question, was syndication OK in some sections, but not others? People felt that syndication in some sense was OK if reporting on world news. It isn’t cheap to fly someone half way around the world for a single story. So syndication in these cases might be ok? (I end that with a question mark because it really is one of those things I’m not sure about)
Both these issues were simply enough to fix. So I made a couple of changes to my app and away we went.
First, I found that NZHerald has “archive” pages that contains about 6 days worth of articles for each category. Some days it had less, but it atleast had a a few hundred posts per category on there. So instead of going to the homepage I could just grab these pages and follow all the links to the stories. From there it was easy to write down what category the story is in so we could try and do some grouping with the results.
And as a small change to the first test. I would split out AP Wire posts from regular AP attributions. I’m still not 100% sure what a regular attribution counted for (Was it a verbatim post? Were large sections from the original? Was it just a source?), but we could be sure that AP Wire posts were automatically posted without any editing from the associated press feed. We know this because it says so right there on the page….
One note. For the majority of these stats when we talk about “syndicated” posts, we are talking about overseas syndication sources. Local NZME sources (Local newspapers and the like) don’t count. Why? Because I felt it wasn’t in the “spirit” of the discussion. This whole thing started because people felt like there was an influx of “dailymail” type stories, I don’t think locally produced stories should really count against the Herald.
So first let’s just group by their syndication “types” and see what we get.
|Article Count||Syndication Type|
OK, so right away it tells us that there is quite a bit of overseas syndicated content on the Herald. But let’s go further, let’s try and break it down by category so we can compare.
So this starts making sense now. We see that for National issues, almost all content is from the Herald itself or from local sources (Other NZME sites). And when we get to World news, it’s mostly overseas content.
Let’s take a closer look at World to begin with. As mentioned earlier, there are really two types of syndication at play here. The AP wire which is an unedited article straight from Associated Press, and regular syndication. Let’s break it down.
What does this tells us? Well. It tells us that there really isn’t any reason to be reading the Herald for world news over another overseas paper because none of it’s content is it’s own. In fact most of it is posted directly from the AP wire.
But let’s dive into what there seems to be universal hatred for. Daily Mail syndication. Let’s check by percentage, which category is worst hit with it.
Take note of the Y-axis. It’s only going up to 16 – that being 16%. So even in the worst hit categories, we are “only” seeing 16% of it’s content be direct from Daily Mail. I put only in quotes because that’s still 1 in 6 articles being out and out clickbait.
How might I take this further? I think it goes back to my original post. Looking deep into a category is great, but it really depends on how the Herald is pushing it. If there is daily mail articles, but they are deep within the site and never surface to the homepage (Not the case here, but just saying), then who really cares? But what I tended to see throughout collecting this data is that it was the homepage real estate that seemed to be clogged with clickbait titles (After all, that’s what clickbait is for). So even if there is one title of “and you won’t believe what happened next” per day, if that article is a “lead” story and gets prime position on the homepage, that’s what I think is most irritating. So maybe that takes us to a part 3 in the future.
The data I used is available in CSV format here. I personally threw the data into a SQL database because my excel skills are extremely poor. I’m no data scientist so as always, other insights into the data in the comments is more than welcome!