Hacker News Item Lifetime

Stages in an on-line community:
  • Limited but in-depth discussions
  • Grow gently, and get more points of view,
    • ... but everyone understand what the group is about
  • Significant growth, with newcomers starting to out-number the old hands.
  • The newcomers mean the interests and topics get diluted
    • ... and the old hands complain that the quality has gone to hell in a handcart
  • Evaporative cooling - the highest quality participants start to drift away
  • The community loses direction entirely,
    • ... and the people whose discussion everyone joined to engage, have gone
I've got a real interest in on-line communities - the opportunities they afford, and the problems they face. It seems that every community goes through the same phases. These are documented and discussed in more detail elsewhere by people more qualified than I, and I'm not going to analyse them here.

I'm trying to understand this in more detail, and as a part of that I'm looking at Hacker News - http://news.ycombinator.com/news - a community that's been going for a few years, and which many claim still has great conversations and discussions.

As a small part of my attempts to understanding the complex phenomena involved I've been looking at rates of submission, and how long a submission stays on the "New" page: http://news.ycombinator.com/newest

I'm been taking snapshots of the "Newest" page for a while, and I've been trying to see patterns in how old the 30th item is on that page. The younger it is, the faster items are being submitted. If items are being submitted slowly, then the 30th item will be quite old.

X=Date,
Y=Time of day,
RED=very short life,
WHITE=100 minutes,
GREEN=200 minutes
So I've got a bunch of data points. For each snapshot I've got:

There is a problem, because the age of the oldest item is only given in hours when it's old enough. To try to account for that I've extrapolated, taking the last item whose age is given in minutes, and assuming the rate is linear and constant. Thus if the 20th item is 58 minutes old, we assume the 30th item is 58/2*3 minutes old. This extrapolation is, in fact, pretty poor. There are better things that could be done, and might if this investigation shows real promise. I've then discarded the answers if more than 200 minutes - I'll show a more complete data set later.

At right is a plot of the data. In this we have date running horizontally, time of day running vertically, and the (extrapolated) age of the 30th item shown in colour. Data collection started in April 2009, so there are about 580 days of samples.

The striping is caused by the regularity of the snapshots being taken. The times drift throughout the day.

We can see pretty clearly that we used to have mostly green. Items used to be on the "New" page for longer than they are now. This is no surprise, because as the community has grown, so has the number and rate of submissions. This pushes items off the "New" page more rapidly.

We can see that even now items remain on the "New" page for a significant time if submitted between 06:00 and 12:00 (time runs from bottom to top, 00:00 at the bottom, rising through 06:00, 12:00 midday in the middle, evening towards the top, and the top line is immediately before midnight. All times GMT)

We can clearly see that the top right quadrant is mostly red. This shows that, recently, items submitted in the afternoon (GMT) have the shortest lifetime on the "New" page. Likewise we can see that most recently, items submitted between 06:00 and 12:00 (GMT) survive the longest.

Of course, this could, and most like will, vary between mid-week and the weekend. Here we can see the data for the Friday. It's more sparse, of course, but the pattern is identical.


Mid-week

Weekend

In the plots above we have the data for mid-week, for Friday, and for the weekend. The same pattern appears, although the range varies because of outliers. The chart for the weekend specifically has just a few red points, which has shifted the range. Removing them would make the range better, but that's not really worth the time.

The plot at right shows the lifetime as a function of date. Clearly the lifetime of a submission on the "Newest" page is declining over time, although the spread is substantial.

The colours are vaguely stratified, showing the effect of time-of-day, but it's not entirely clear-cut.

The notch in the middle is the slow-down over Christmas.


X is date, Y is lifetime,
colour is time of day.

Finally, at left we show the lifetime plotted as a function of the time-of-day. We can see the effect noted earlier, that items submitted between 06:00 and 12:00 (GMT) survive longer. The colouring clearly shows that things used to live longer - red being the older data, white is intermediate, and green is newer.


The data are available for anyone else who wants to play, and I'll make my code available to anyone on request.