Judgement vs Accountability

What with one thing and another, it’s taken me a while to sit down to write this, and the event that triggered it - the furore over this year’s GCSE results – already seems like old news. But it got me thinking more broadly, and I hope those thoughts are still relevant these several news cycles later. So: on a lively Newsnight debate about the GCSEs, someone suggested that exams at 16 were unnecessary, and said something like ‘teachers are professionals, they can use their professional judgement to assess their students at that age without the need for external examining bodies’. I don’t have particularly strong views on this particular topic (although I’m happy my results did not depend on my chemistry teacher who once graded – apparently without noticing – a pile of French essays that we handed in for a joke) but the underlying issue of the (not always complementary) relationship between professional judgment and rigid accountability seems to me highly relevant to academia, in several ways.

Most obviously, of course, in teaching. In general the days of simply sticking a grade on a paper with no justification have passed, and with them rumours of dubious practices (the famous ‘chuck a pile of essays down stairs and rank them by where they fall’). This is surely a good thing, and is the least that students should expect now that they have a more personal sense of what their education is costing.

But, partly as a consequence of increasingly assertive students, I’m getting more and more questions about the marks I give for undergraduate essays. Not disputing the marks, but asking what they would have needed to do to get that 72 rather than 68, or 75 rather than 72… Now I do try to set out an explicit marking scheme, and to provide ample feedback, but sometimes it’s tempting just to say ‘I just thought it was on solid 2:1’; or ‘What do you need to do to get 80? Just write a fantastic essay!’; or ‘What makes a great essay? Not sure but I know one when I read one…’ The strict accountability introduced by rigid marking schemes can be your friend when you have 150 exam scripts to process, but when you’re marking half a dozen tutorial essays it can get in the way of a more subjective judgement.

Something similar happens in the peer review process for both papers and grant proposals. For papers, especially when acting as an editor and rejecting work without sending it for full review, I frequently justify this course of action using bits copied and pasted from the journal’s aims and scope to defend my decision in an accountable fashion. But usually what I’m really saying (except on those occasions when I’m saying: 'this is crap') is, ‘Nah, sorry, didn’t really float my boat’. Or to couch the same sentiment in more formal language, ‘In my professional judgement, I don’t think this work merits publication in journal X’. Full stop. I think this has some similarities to a GP’s diagnosis – one hopes that it is founded in a good understanding of the subject, but one need not document every single step ruling out all other possible diagnoses.

Finally, in reviewing grant proposals you can be forced to be more prescriptive than perhaps you would like. Certain boxes must be filled in, for instance on what you perceive to be the main strengths and weaknesses of the proposed work, which forces you to break down the proposal in a way which may not match your gut feeling (to use another term for professional judgement). So something that you thought was eminently fundable is scuppered because you happened to list more in the weaknesses column than in the strengths – regardless of your overall impression.

Accountability is of course absolutely essential to the process of science – the audit trail which leads from raw data to published results is arguably more important than the results themselves. But in the assessment of its worth? I’m not so sure.

My own personal Impact Factor

The editor of a well-respected ecological journal told me recently, “I am… very down on analyses that use citation or bibliographic databases as sources of data; I'm actually quite concerned that the statistical rigor most people learn in the context of analysing biological data is thrown out completely in an attempt to show usage of a particular term has been increasing in the literature!” I think he has a point, and in fact I feel the same about much that I read on bibliometrics more generally: there’s some really insightful, thoughtful and well-reasoned text, but as soon as people attempt to bring some data to the party all usual standards of analytical excellence go out the window. I see absolutely no reason to buck that trend here.

So…

The old chestnut of Journal Impact Factors has been doing the rounds again, thanks mainly to a nice post from Stephen Curry which has elicited a load of responses in the comments and on Twitter. To simplify massively: everyone agrees that IFs are a terrible way to assess individual papers (and by inference, researchers), but there’s less agreement on whether they tell you anything useful when comparing journals within a field. Go read Stephen’s post if you want the full debate.

But what’s sparked my post was a response from Peter Coles (@telescoper), called The Impact X-Factor, which proposed an idea I’d had a while back about judging papers against the IF of the journal in which they’re published. Are your papers holding up or weighing down your favourite journal? Let’s be clear from the outset: I don’t think this tells us anything especially interesting, but that needn’t put us off. So I have bitten the bullet, and present to you here my own personal impact factor. (The fact I come out of it OK in no way influenced my decision to go public.)

The IF of a journal, remember, is simply the mean number of citations to papers published in that journal over a two-year period (various fudgings and complications make it rather more opaque than that, but that’s it in essence). So for each of my papers (fortunately there aren’t too many) I’ve simply obtained (from my google scholar page, as it’s more open that ISI) the number of citations they accrued in the two years after publication. I’ve then compared this to the relevant journal IF for that period, or as close as I could get. Here are the results:

OK, bit of explanation. This simply plots the number of citations my papers got in the two years post-publication, against the relevant IF of the journal in which they were published. (The red points are papers published in the last year or so, and I’ve down-weighted IF to take account of this; I’ve excluded a couple of very recently-published papers.) The dashed line is the 1:1 line, so if my papers exactly matched the journal mean they would all fall on this line. Anything above the line is good for me, anything below it bad – the histogram in the bottom right shows the distribution of differences of my papers from this line.

I’ve fitted a simple Poisson model to the points, with and without the outlier in the top right – neither does an especially good job of explaining citations to my work, so we might as well take a mean, giving me my own personal IF of around 6.

As my editor friend suggested, there’s a whole lot wrong with this analysis. For instance, I haven’t taken account of year of publication, or any other potential contributing factors (coauthors, publicity, etc. etc.). Another obvious caveat is the lack of papers in journals with IF > 10 (I can assure you that this has not been a deliberate strategy). But back in the peloton of points which represent the ecology journals in which I’ve published most regularly, I’m reasonably confident in stating that citations to my work are unrelated to journal IF. Gratifyingly too, the papers that I rate as my best typically fall above the 1:1 line.

So there we have it. My own personal impact factor.

Defining a Field

What does it take to have a real impact on the development of your field? Those charged with assessing UK research have taken the view that a small number of exceptional papers are a better indicator of quality than a mass of ‘lesser’ papers. Now we can quibble (indeed, I have done) about the way that ‘exceptional’ papers are identified (in particular by risk-averse departments and institutions). Furthermore, I’ve argued that setting out with the intention of writing a ‘high impact paper’ is often antithetical to doing good science. However, that’s beside the point. Regardless of the nuts and bolts of measurement, the idea that one should be judged on quality not quantity seems to be reasonably widely accepted. But if we’re taking a retrospective view, is it always the case that you can trace the development of a field back to one or two highly influential papers? I’ve been pondering this since the first meeting last month of the British Ecological Society’s Macroecology Special Interest Group. (Macroecology is ecology at large spatial scales, by the way, and is what I do. There’s a brief but useful wikipedia page here.)

As is the nature of such inaugural meetings, first our committee chair Nick Isaac, then our opening keynote speaker, Ian Owens from the Natural History Museum, provided a potted history of the discipline. In so doing, it is common practice to pick a significant publication and trace its subsequent influence. But in macroecology, that’s tricky...

OK, you could pick the 1989 Science paper by James Brown and Brian Maurer which originally coined the term ‘macroecology’. This paper has accrued a satisfying-but-not-stellar 329 cites, but my suspicion is that it’s not actually been that widely read, and that many of the citations run something like “the term ‘macroecology’ was first coined by Brown & Maurer (1989)...”

Or we could focus on the pioneers of UK macroecology, Kevin Gaston and Tim Blackburn. They have forged a formidable partnership, coauthoring 87 papers over a 20 year period, with a phenomenal burst of productivity in the mid to late 1990s which saw them publish as coauthors around 10 papers a year, papers which provided the foundation for much subsequent macroecological work. (Both have been prolific independently of each other too. Sickening isn’t it?) The point is, though, that it is difficult to pick a single of these 50 or so papers as being suitable for the ‘what happened since the publication of x’ rhetorical device. Although their work in aggregate has been well cited – 8 papers from that period 1993-2000 have picked up >100 cites – the maximum number of cites for a single work is <300. Which is good, no doubt, but not spectacular.

So what Nick and Ian both did was to pick as their milestones three books, two by Kevin and Tim (Pattern & Process in Macroecology from 2000, which summarised much of their previous five years of work, and the edited volume Macroecology: Concepts & Consequences) and one by Brown (Macroecology, a single-word title which has amusingly been cited in 20 different ways according to ISI WoK, including the antonymous Microecology!). Rich Grenyer Tweeted at the time that this had interesting implications for the high-impact-paper-obsessed REF, but I think it also tells us something about the development of scientific fields more generally.

Of course there are occasions when one or two landmark publications define decades of subsequent research (think Einstein, or Crick & Watson, even the occasional book like Hubbell’s Unified Neutral Theory of Biodiversity). But often it is steady accumulation, the gradual assembly of a body of work which counts. This recognises that it is not always possible – or if possible, not desirable – to force everything of value that you have to say into the strict limits of some of the higher profile journals (and you may not wish to see what you consider to be important analyses buried, probably unread, in supplementary material). This view of science essentially treats the literature as a kind of open notebook – a record of thought processes and incremental progress, rather than a single statement of ultimate  truth. And in the case of macroecology, this broad foundation has served us very well.

Perhaps I can make a (non-Olympic) sporting analogy. Cricket exists in several formats, with the extremes being the smash-bang-wallop of Twenty20 (matches last about 3 hours) and the rather more sedate 5 day test matches. A test match batsman will steadily accumulate, and won’t try to hit every ball out of the ground – although if a ball is tempting enough, of course he won’t turn down the opportunity for the big hit. This mix of accumulation and opportunism seems to me to be a much better strategy for ensuring that a field is built on solid foundations than the headline-grabbing, REF-driven, try-to-hit-everything-straight-into-Nature T20 style.

As any cricket fan will tell you, test matches are more substantial and ultimately far more satisfying than any limited overs jamboree. And the occasional 6 is made all the sweeter for its scarcity.