publication

Measuring the intangible: lessons for science from the DRS?

The final Ashes test of this summer has just started, a welcome distraction, no doubt, for some of those academics holed up preparing REF submissions (see Athene Donald’s recent post to get a feel for how time consuming this is, and the comments under it for a very thoughtful discussion of the issues I’m covering here). It also provides the perfect excuse for me to release another convoluted analogy, this time regarding the approaches taken in test match cricket and in academic science to measuring the intangible.

Anyone who follows sport to any extent will know how armchair and professional pundits alike love to stir up controversy, and much of that generated in this current series has revolved around the Decision Review System (DRS) - the use of television technology to review umpires’ decisions. Of course, TV technology is now used in many sports to check what has actually happened, and that is part of its role in cricket - Did a bat cross a line? Did the ball hit bat or leg? Whilst sports fans will still argue over what these images show, at least they are showing something.

Cricket, however, has taken technology a step further. In particular, one of the ways that a batsman can get out in cricket is LBW (leg before wicket), where (in essence) the umpire judges that, had it not hit the batsman’s leg, the ball would have gone on to hit the wicket. LBWs have been a source of bitter disputes since time immemorial, based as they are on supposition about what might have happened rather than anything that actually did. The application of the DRS was meant to resolve this controversy once and for all, using the ball-tracking technology hawk-eye to predict exactly (or so it seems) the hypothetical trajectory of the ball post-pad.

Of course, sports being sports, the technology has simply aggravated matters, as ex-players and commentators loudly question its accuracy. But, as well demonstrating how poorly many people grasp uncertainty (not helped by the illusion of precision presented by the TV representation of the ball-tracking), it struck me that there are parallels here with how we measure the quality of scientific output.

First and foremost, in both situations there is no truth. The ball never passed the pad; quality is a nebulous and subjective idea.

But perhaps more subtely, we also see the bastion of expert judgement (the umpire, the REF panel) chellenged by the promise of a quick techno-fix (hawkeye, various metrics).

What has happened in cricket is that hawk-eye has increasingly been seen as ‘the truth’, with umpires judged on how well they agree with technology. I would argue that citations are the equivalent, at least as far as scientific papers go. Metrics or judgements that don’t correlate with citations are considered worthless. For instance, much of the criticism of journal Impact Factors is that they say little about the citation rates of individual papers. This is certainly true, but it also implicitly assumes that citations are a better measure of worth than the expert judgement of editors, reviewers, and authors (in choosing where to submit). Now this may very well be the case (although I have heard the opposite argued); the point is, it’s an assumption, and we can probably all think of papers that we feel ought to have been better (or less well) cited. As a thought experiment, rank your own papers by how good you think they are; I’ll bet there’s a positive correlation with their actual citation rates, but I’ll also bet it’s <1. (You could also do the same with journal IF, if you dare…)

So, we’re stuck in the situation of trying to measure something that we cannot easily define, or (in the case of predicting future impact) which hasn’t even happened yet, and may never do so. But the important thing is to have some agreement. If everyone agrees to trust hawk-eye, then it becomes truth (for one specific purpose). If everyone agrees to replace expensive, arduous subjective review for the REF with a metric-based approach, that becomes truth too. This is a scary prospect in many ways, but it would at least free up an enormous amount of time currently spent assessing science, to actually doing it (or at least, to chasing the metrics…)

My own personal Impact Factor

The editor of a well-respected ecological journal told me recently, “I am… very down on analyses that use citation or bibliographic databases as sources of data; I'm actually quite concerned that the statistical rigor most people learn in the context of analysing biological data is thrown out completely in an attempt to show usage of a particular term has been increasing in the literature!” I think he has a point, and in fact I feel the same about much that I read on bibliometrics more generally: there’s some really insightful, thoughtful and well-reasoned text, but as soon as people attempt to bring some data to the party all usual standards of analytical excellence go out the window. I see absolutely no reason to buck that trend here.

So…

The old chestnut of Journal Impact Factors has been doing the rounds again, thanks mainly to a nice post from Stephen Curry which has elicited a load of responses in the comments and on Twitter. To simplify massively: everyone agrees that IFs are a terrible way to assess individual papers (and by inference, researchers), but there’s less agreement on whether they tell you anything useful when comparing journals within a field. Go read Stephen’s post if you want the full debate.

But what’s sparked my post was a response from Peter Coles (@telescoper), called The Impact X-Factor, which proposed an idea I’d had a while back about judging papers against the IF of the journal in which they’re published. Are your papers holding up or weighing down your favourite journal? Let’s be clear from the outset: I don’t think this tells us anything especially interesting, but that needn’t put us off. So I have bitten the bullet, and present to you here my own personal impact factor. (The fact I come out of it OK in no way influenced my decision to go public.)

The IF of a journal, remember, is simply the mean number of citations to papers published in that journal over a two-year period (various fudgings and complications make it rather more opaque than that, but that’s it in essence). So for each of my papers (fortunately there aren’t too many) I’ve simply obtained (from my google scholar page, as it’s more open that ISI) the number of citations they accrued in the two years after publication. I’ve then compared this to the relevant journal IF for that period, or as close as I could get. Here are the results:

OK, bit of explanation. This simply plots the number of citations my papers got in the two years post-publication, against the relevant IF of the journal in which they were published. (The red points are papers published in the last year or so, and I’ve down-weighted IF to take account of this; I’ve excluded a couple of very recently-published papers.) The dashed line is the 1:1 line, so if my papers exactly matched the journal mean they would all fall on this line. Anything above the line is good for me, anything below it bad – the histogram in the bottom right shows the distribution of differences of my papers from this line.

I’ve fitted a simple Poisson model to the points, with and without the outlier in the top right – neither does an especially good job of explaining citations to my work, so we might as well take a mean, giving me my own personal IF of around 6.

As my editor friend suggested, there’s a whole lot wrong with this analysis. For instance, I haven’t taken account of year of publication, or any other potential contributing factors (coauthors, publicity, etc. etc.). Another obvious caveat is the lack of papers in journals with IF > 10 (I can assure you that this has not been a deliberate strategy). But back in the peloton of points which represent the ecology journals in which I’ve published most regularly, I’m reasonably confident in stating that citations to my work are unrelated to journal IF. Gratifyingly too, the papers that I rate as my best typically fall above the 1:1 line.

So there we have it. My own personal impact factor.

Defining a Field

What does it take to have a real impact on the development of your field? Those charged with assessing UK research have taken the view that a small number of exceptional papers are a better indicator of quality than a mass of ‘lesser’ papers. Now we can quibble (indeed, I have done) about the way that ‘exceptional’ papers are identified (in particular by risk-averse departments and institutions). Furthermore, I’ve argued that setting out with the intention of writing a ‘high impact paper’ is often antithetical to doing good science. However, that’s beside the point. Regardless of the nuts and bolts of measurement, the idea that one should be judged on quality not quantity seems to be reasonably widely accepted. But if we’re taking a retrospective view, is it always the case that you can trace the development of a field back to one or two highly influential papers? I’ve been pondering this since the first meeting last month of the British Ecological Society’s Macroecology Special Interest Group. (Macroecology is ecology at large spatial scales, by the way, and is what I do. There’s a brief but useful wikipedia page here.)

As is the nature of such inaugural meetings, first our committee chair Nick Isaac, then our opening keynote speaker, Ian Owens from the Natural History Museum, provided a potted history of the discipline. In so doing, it is common practice to pick a significant publication and trace its subsequent influence. But in macroecology, that’s tricky...

OK, you could pick the 1989 Science paper by James Brown and Brian Maurer which originally coined the term ‘macroecology’. This paper has accrued a satisfying-but-not-stellar 329 cites, but my suspicion is that it’s not actually been that widely read, and that many of the citations run something like “the term ‘macroecology’ was first coined by Brown & Maurer (1989)...”

Or we could focus on the pioneers of UK macroecology, Kevin Gaston and Tim Blackburn. They have forged a formidable partnership, coauthoring 87 papers over a 20 year period, with a phenomenal burst of productivity in the mid to late 1990s which saw them publish as coauthors around 10 papers a year, papers which provided the foundation for much subsequent macroecological work. (Both have been prolific independently of each other too. Sickening isn’t it?) The point is, though, that it is difficult to pick a single of these 50 or so papers as being suitable for the ‘what happened since the publication of x’ rhetorical device. Although their work in aggregate has been well cited – 8 papers from that period 1993-2000 have picked up >100 cites – the maximum number of cites for a single work is <300. Which is good, no doubt, but not spectacular.

So what Nick and Ian both did was to pick as their milestones three books, two by Kevin and Tim (Pattern & Process in Macroecology from 2000, which summarised much of their previous five years of work, and the edited volume Macroecology: Concepts & Consequences) and one by Brown (Macroecology, a single-word title which has amusingly been cited in 20 different ways according to ISI WoK, including the antonymous Microecology!). Rich Grenyer Tweeted at the time that this had interesting implications for the high-impact-paper-obsessed REF, but I think it also tells us something about the development of scientific fields more generally.

Of course there are occasions when one or two landmark publications define decades of subsequent research (think Einstein, or Crick & Watson, even the occasional book like Hubbell’s Unified Neutral Theory of Biodiversity). But often it is steady accumulation, the gradual assembly of a body of work which counts. This recognises that it is not always possible – or if possible, not desirable – to force everything of value that you have to say into the strict limits of some of the higher profile journals (and you may not wish to see what you consider to be important analyses buried, probably unread, in supplementary material). This view of science essentially treats the literature as a kind of open notebook – a record of thought processes and incremental progress, rather than a single statement of ultimate  truth. And in the case of macroecology, this broad foundation has served us very well.

Perhaps I can make a (non-Olympic) sporting analogy. Cricket exists in several formats, with the extremes being the smash-bang-wallop of Twenty20 (matches last about 3 hours) and the rather more sedate 5 day test matches. A test match batsman will steadily accumulate, and won’t try to hit every ball out of the ground – although if a ball is tempting enough, of course he won’t turn down the opportunity for the big hit. This mix of accumulation and opportunism seems to me to be a much better strategy for ensuring that a field is built on solid foundations than the headline-grabbing, REF-driven, try-to-hit-everything-straight-into-Nature T20 style.

As any cricket fan will tell you, test matches are more substantial and ultimately far more satisfying than any limited overs jamboree. And the occasional 6 is made all the sweeter for its scarcity.