Yesterday Nature reported on the launch of the Map of Life project, a new initiative to collate biodiversity records, which allows users to map these, to extract species lists for any area of the planet, and (ultimately) to upload their own data. Limited initially to terrestrial vertebrates and North American freshwater fish, the demo website still looks like a lot of fun. But it also reminded me somewhat of a UK-based project which has been running for a number of years, the National Biodiversity Network (and specifically the data service at the NBN Gateway). This gives me a good excuse to comment on the NBN, which I’ve been meaning to do for a while. Specifically, why hasn’t the NBN Gateway been used more by the research community? Let me first declare some interests. This question was raised by the British Ecological Society around 18 months ago, who convened a scientific working group chaired by Tim Blackburn, of which I was part. And the NBN Trust (@NBNTrust, if you like to Tweet) is actively trying to promote its potential as a research resource, and I’m writing this post partly in response to a request from Mandy Henshall, NBN Trust Information and Communications officer, to spread the word and to find out what it would take to get the data used.
The NBN grew out of the need to standardise and coordinate the many thousands of local, regional or national surveys to provide a national picture of the UK’s biodiversity. The NBN Gateway is simply the portal through which these data can be accessed. And it’s become an extremely impressive dataset: currently >75M records from >700 individual datasets. The Gateway itself if really nicely designed for the general user. You can search on an interactive map, or by site name, or by taxon, and quickly get a list of everything that’s been recorded – fantastic if you’re planning a trip to an RSPB reserve, say, and want to know what birds you’re likely to see; equally good if you’re leading a field trip and want to prime your students about what might be there. (Worth noting too that the NBN encompasses all taxa and habitats, including some limited coverage of marine systems.) As a citizen science / public engagement project, the NBN is absolutely superb, and I urge you to go and have a play.
But does it work as a tool for academic biodiversity research? Some things it does well, for instance the (nontrivial) task of standardising taxonomy across multiple datasets. But we identified several potential shortcomings, most obviously the fact that not all data are publicly available – it can be incredibly frustrating to see a great dataset identified by your search, but not to be able to access it. Of course, the problem of data access is not restricted to the NBN, and they clearly had to make a choice – include everything with restricted access, or include only a subset of available data which can be provided completely open. Other initiatives, for instance the Ocean Biogeographic Information System (OBIS) went this second route, the idea being that if sufficient people can be convinced to make their data available, peer pressure will mount on those who won’t. But this discussion of open data is best left for another day.
Other barriers we identified concerned the different ways that scientists like to access and download data, compared to the public. For instance, we often want to be able to access data programmatically, or at least to have an audit trail of specific queries, rather than working through nice friendly GUIs. And often we want to download data as a simple text file for further analysis, with no whistles and bells.
Finally, there is the matter of the data itself (and pedants: yes, data is a singular noun). The NBN contains some fantastic systematic scientific survey data, but also a lot of more haphazard observational data, which may be reliable in terms of recording the presence of a given species at a particular site, but which tells us little about absences. Suppose Mr Patel has a fascination with limpets, and has been counting them on Filey Brigg every week for years. His data would give us a fantastic picture of the limpet population, but the absence of records for barnacles or periwinkles doesn’t mean that they’re not there – crucial if you’re interested in the whole community.
Such limitations suggest that the researcher proceed with caution through the NBN gateway; but the advantages of such a huge dataset mean that simply to ignore it may be to miss out on a terrific resource. There are already various examples of NBN being used by students for research projects. The question is, what would it take for wider uptake by the research community?