ecology

Trait databases: the desirable and the possible

Another major traits database has recently come online. This time, all you need to know about the life histories of 21,000+ amniotes (reptiles, birds, mammals), courtesy Nathan Myhrvold, Morgan Ernest and colleagues. I’ve been working with ecological traits for a good while now, and this kind of thing excites me. It also demonstrates the kind of self-interested altruism that typifies the Open Science mentality. As Morgan puts it in her blog post on the paper:

The project started because my collaborator, Nathan Myhrvold, and I both had projects we were interested in that involved comparing life history traits of reptiles, mammals, and birds, and only mammals had easily accessible life history databases with broad taxonomic coverage. So, we decided to work together to fix this. To save others the hassle of redoing what we were doing, we decided to make the dataset available to the scientific community.

In other words, you start by fixing a problem that you yourself have, and then make your solution available to save others the bother. Practical and admirable. The same thing is happening elsewhere, with other kinds of ecological data - take the ‘data rescuing’ example of the PREDICTS project:

https://twitter.com/KatheMathBio/status/676448069890744321

(Compare and contrast with this fascinating, frustrating new book by John William Prothero, The Design of Mammals: A Scaling Approach, another monumental data compilation which includes a multitude of intriguing scaling relationships, calculated from 16,000 records for around 100 response variables, almost none of which are replicable from the subsets of data provided in the Resources section online.)

But Morgan’s blog post becomes really interesting as she muses on the what the end game might be for traits databases. She proposes a centralised trait database, with a focus on individual records, that is easy to contribute to and where data are easily accessible. We had a short exchange on Twitter after reading this, but I’ve continued mulling it over and my thoughts have expanded past 140 characters. Hence, this.

Basically, I have been trying to imagine what this kind of meta-dataset might look like. And my difficulty in doing this in part boils down to how we define a ‘trait’.

The simplest definition is pretty broad, with a trait just any measurable property of an organism (noting that some ‘traits’ apply only to populations - e.g. abundance - or even entire species - e.g. range size). And my own work, like Morgan’s, has typically focused on life history and ecological traits - things like size, growth, reproduction, and feeding. In some respects these are some of the simplest traits to describe, but they can still be tough to measure, and (especially) to classify and record.

Part of this difficulty arises because much work on traits involves imposing categories on nature, and nature abhors a category. Then again, individuals of the same species can do quite different things, or the same individual might display different traits at different times or in different places. Some people have tried to get around this by using a ‘fuzzy coding’ approach - for instance, rather than having to classify me as ‘carnivore’ or ‘herbivore’, you could say that my diet is split, say, 15% carnivore to 85% herbivore. In many ways this seems a sensible solution, but it is of course rather subjective, still requires some rather arbitrary categorisation, and, in the context of this post, is very difficult to incorporate into a more generic database.

Other traits may seem simpler. Body size, for example, the so-called ‘master trait’, us surely easy to measure? You just weigh your organism, right? Or perhaps you measure it’s length. And is that total length, or wingspan, or leg length, or standard length? Oh, you want dry weight? Or mgC? Equivalent Sperical Volume, you say? And so on and on…

Related to body size are other morphological traits. For instance, my colleague Gavin Thomas and his group are busy 3D scanning beaks of all species of bird (you can help, if you like!). Such sophisticated morphological measurement quickly generate individual-level ‘trait’ databases of many dimensions; how might these be incorporated into a more general database? Record each dimension? Or use some agreed (but somewhat arbitrary) composite measure of ‘shape’?

One more example of a different class of traits. On the Marine Ecosystems Research Programme, I’m working quite a bit with ecosystem modellers, and their lists of desired traits are terrifying: Michaelis Menton half sat. uptake const.; Excreted ingested fraction; Respired fraction - all things that seem a long way from the sorts of life history databases I’m familiar with, or from things that can be easily observed in the field. Many of them will be body size and temperature dependent (at least). And so what might be most appropriate to record are the parameters from some fitted scaling relationship; but this means losing a lot of raw data, which surely we would like to retain?

And so on.

So whilst of course I applaud the efforts of people like Morgan to make large trait databases more useful and accessible, and agree completely that we should both think big and think individual, a complementary angle of attack might be to make linking existing databases easier. Of course, they need to be available, well documented, and appropriately licensed as a first step, and straightforward programmatic access should be designed in. But we can also make more efforts to link to taxonomic standards, to ensure we include accurate geographical and contextual information with individual records. Always ensuring our data are nice and tidy so that others can easily do more interesting things with them.

An Appreciation of John Steele

When I received the sad news, yesterday, that John Steele had died of the cancer that had afflicted him this last year, my instinct was to share the passing of a scientific hero as widely as possible. I duly tweeted, but given the general lack of response I wondered if perhaps his legacy is not as widely appreciated as I believe it should be. Hence this personal appreciation. I never met John, although I had been corresponding with him over the last couple of months, and was due to speak to him the morning after he was hospitalised for what turned out to be the final time. As an aside - the fact that I was approached, in such a generous manner (His first email to me ended: “This email is a rather long-winded way of saying - welcome; and I look forward to useful and illuminating discussions”) to collaborate with someone whose work, as you’ll see, has been an inspiration to me, is one of those great egalitarian things that happens from time to time in science, and I was thrilled to have this opportunity. But, as a result of this limited personal interaction - just a handful of emails - my appreciation is limited to John’s work, both his publications and this new, unpublished material to which I was contributing, which was buzzing with intriguing and innovative ideas.

Actually, I can’t hope to do justice to John’s wider scholarship here, and cover only really that small part of his work which addressed the issue of differences in temporal and spatial dynamics between marine and terrestrial ecosystems, an issue that has been central to my own research. As I set out in an earlier post, my own journey to the position now where I (reasonably confidently) call myself a marine ecologist has been rambling and convoluted. Along the way, certain of John’s papers stood out like beacons, reassuring me that there was indeed a path to follow, no matter how overgrown.

Mainly, these beacons consisted of a clutch of papers published in the early 1990s, in particular a 1991 paper in the Journal of Theoretical Biology (Can ecological theory cross the land-sea boundary?) and a 1994 Phil Trans paper with Eric Henderson on Coupling between phyiscial and biological scales). Similar ideas were further developed in papers in Ecological Research (Marine Ecosystem Dynamics: Comparison of Scales) and Bioscience (Marine Functional Diversity). All of these, in turn, were building on John’s 1985 review in Nature, A comparison of terrestrial and marine ecological systems.

Key to all of these papers is the idea of scale, both spatial and temporal, and especially how the scale of variability is different in marine than in terrestrial systems. Because the seas act as an enormous thermal buffer, variability is fundamentally different there than on land. I’ve been playing with some data to try to show this (see below), but the concept is simple: if you stand in one place for 24 hours on land, depending where on Earth you are, you might easily experience a temperature range of 20˚C or more. In most places, the temperature of the sea - even at its surface - won’t vary nearly this much in a year. Spatial variation is similar - you will typically find much more variability (along all kinds of axes, not just temperature) in a square kilometer of terrestrial habitat than in a square kilometre of sea. This clearly has impacts on the organisms living there: if you’re a lizard and you’re too hot, you can maybe move a metre or two from full sun into the shade. A marine fish might have to move hundreds of kilometres (or tens of metres deeper) to achieve a similar drop in temperature. So these patterns of environmental variation are clearly important in order to understand species’ responses to climate change, and can explain some of the subtle differences already seen between marine and terrestrial species (see for example recent papers by Sunday et al., Burrows et al.).

 

One of John’s major insights was that physical and biological processes were typically more closely coupled in space and time in marine than in terrestrial systems. This stemmed from his strong background in physical oceanography. Indeed, in our recent correspondence he confessed “I have no systematic training in biology”; rather he epitomised the interdisciplinary nature of fisheries science, in which connections between the physical environment and biological resources have always been recognised in a way that terrestrial ecologists have only relatively recently accepted. Despite this lack of formal training, his ecological insight was astute, as apparent throughout his 1974 book The Structure of Marine Ecosystems, from which, incidentally, I took the opening quote for the Royal Society Research Fellowship application which currently supports me: “The first impression one forms of any community is usually of the diversity of species present, and of the differences in numbers, with some species abundant and others scarce”. That book has some interesting parallels with Alec MacCall’s later Dynamic Geography of Marine Fish Populations, in that it’s ecological content (MacCall’s book is, in my view, an excellent primer on macroecology, though the word is not used) was destined to be overlooked by most ecologists because the word ‘marine’ appears in the title.

This has inevitably only scratched the surface of John’s work. What drove him, I think, to return time and again to the marine-terrestrial comparative idea is summed up best in the abstract of the Journal of Theoretical Biology piece: “It is proposed that theories developed in one sector can be tested most critically in the other, with potential for greater generality.” This idea has guided my own research, and was constantly in mind while writing several papers, for example this one which begins with a Steele quote (“I argue that we should attempt to address the question of [ecological] generalizations capable of crossing the land-to-sea boundary”) and, in particular, this opinion piece I published last year. A piece which, I was delighted to discover, John had seen: “I had read your recent TREE paper with interest, and of course, appreciated your references to my cry in the wilderness two decades ago.” I hope that in continuing this search for generality, and performing critical tests of theory, I might send up one or two small flares of my own which - even if they don’t light the path all that brightly - might at least lead others to John’s more illuminating beacons.

Natural history and desk-based ecology

The recent Intecol meeting in London, celebrating the British Ecological Society’s centenary, was perhaps the most Twitter-active (Twinteractive?) conference I’ve been to, with Twitter-only questions at plenaries and plenty of discussion across multiple parallel sessions. One such discussion I dipped into (#ecologyNH) concerned the extent to which a 21st Century ecologist needs to know natural history, a question I’ve been pondering for a while, and one which surfaced again only yesterday in an exchange triggered by Matt Hill (@InsectEcology) and also drawing in Mark Bertness (@mbertness), Ethan White (@ethanwhite) and others. Now the answer to this of course depends on your particular specialism. If you’re a field ecologist then reliably being able to identify your (perhaps many) study species is clearly critical, and many ecological careers outside of academia require very good identification skills in order to assess habitats, prioritise conservation areas, and so on. But ecology’s a broad field, too broad for any one of us to master all of its subdisciplines, and there are skills other than natural history that are equally useful. In particular, an increasing number of us do a kind of ecology which involves sitting in front of a computer screen and playing with other people’s data. In my case, this is macroecology, trying to understand what determines the distribution and abundance of large groups of species over regional to global scales. Is it really necessary for me to be able to put a face to every species name in my dataset in order to extract the kind of general patterns that interest me?

My view is that the answer to this depends on how we define ‘natural history’. As I’ve posted before, I don’t consider myself much of a natural historian, under the rather narrow definition of being able to key out a large number of species; and I don’t believe this holds me back as an ecologist. But on the other hand, I do think that a ‘feel’ for natural history is important. By this, I mean that understanding in general terms the kinds of organisms you work on, and the sorts of ways in which they interact with each other and with their environment, is likely to enhance your understanding of any dataset, and thus will point you in the direction of interesting questions (and away from silly ones). In the same way, I don’t see why a fisheries minister, for example, should be expected to be able to identify every fish on a fishmonger’s slab in order to make sensible policy decisions; but having some general understanding of fish and fisheries above and beyond numbers on a balance sheet seems important to me.

That’s my general thesis, but if you want some specifics, I believe there are some real practical advantages to be gained from a macroecologist taking the time to learn a bit about the natural history of their system, too. First, we all know how easy it is to introduce errors into a large dataset; being able to relate a species name to a mental image of the kind of organism it represents provides an efficient way to spot obvious errors. This is really just an extension of basic quality control of your data - simple plots to identify outliers and so on. But errors need not be outliers - for instance, if you’re looking at the distribution of body size across a very wide range of species, an obvious mistake, like a 50g cetacean or a 50kg sprat, may not be immediately apparent. One such error was only picked up at the proof stage in this paper, when my coauthor Simon Jennings noticed that one of the figures labelled a 440mm scaldfish which he told me was ‘unrealistically big’, in fact over twice the likely maximum length. He was quite right, as a better knowledge of Irish Sea fish would have told me at the outset; fortunately this time we caught the error on time, and it didn’t affect our conclusions at all.we corrected the figure and did the quick check on all the other species that we should have done at the outset.

Of course, there are more formal ways to check data against known limits, but the point is that a bit of expert knowledge - a basic understanding the range of feasible values for a feature of interest - goes a long way. Having worked on many different taxa, not all of which I have personal experience of, my approach to this is to work with some kind of (preferably colourful) field guide near at hand that I can dip in to to remind myself that points of a graph = organisms in an environment.

Some outliers, of course, remain stubbornly resistant to quality control, and you eventually have to accept that they are real. Here again, a bit of natural history can help you to interpret them and to suggest additional factors that may be important. For instance, I have worked quite a bit on the relationship between the local abundance and regional distribution of species. Such  ‘abundance-occupancy’ relationships (AORs) are typically positive, such that locally common species are also regionally widespread. I put it like this: if you drove through Britain, you’d tend to see the same common birds everywhere on your journey, but the rare ones would vary much more from place to place. However, although AORs are well-established as a macroecological generality, there are often outlying species, for instance species with very high local densities but small distributions. Identifying such points (‘Oh, they’re gannets’) and knowing something about them (‘of course, they nest colonially’) can help to explain these anomalies.

Such simple observations - ‘gannets don’t fit the general AOR’ - can then lead to more general predictions - ‘AORs will be different in species that breed colonially’ - that can influence future research directions. In my experience, observations of natural history will frequently suggest new explanations for known patterns, or will lead you to seek out study systems meeting particular criteria in order to test a hunch. A fascination with natural history may lead you to learn about a new ecosystem -  deep sea hydrothermal vents, say - which you then start to think may be perfect for testing theories of island biogeography or latitudinal diversity gradients.

You might also start to question models that gloss over natural historical details. On a winter walk in the Peak District I made the very obvious observation that the north-facing side of the steep valley was deeply frosted while the other, only a hundred metres or so distant but south-facing, was really quite pleasantly warm. This got me thinking about how the availability of such microclimates would not be captured in most of the (kilometre scale) GIS climate layers people use in species distribution modelling, yet could be crucial in determining where a species occurs. This is unlikely to have been an original thought, and is not one I’ve followed up, but it emphasises how real world observation can colour your interpretation of computational results.

More generally, real world observation - ‘going one-on-one with a limpet’, as Bob Paine puts it in a nice interview on BioDiverse Perspectives - gives you a sense of the set of plausible explanations for the phenomena that emerge from datasets at scales too large for one person to experience. This in turn leads to a healthy scepticism of hypotheses that fall outside that set. To paraphrase an earlier post of mine, simply plucking patterns from data with no feel for context and contingency is unlikely to lead to the understanding that we crave.

That said, however, there are benefits to be had from putting aside one’s personal experience and being guided, from time to time, by the data. I guess I’m influenced here by working on marine systems, where the human perspective is not a good guide to how organisms perceive their environment. We simply can’t sense the fine structure of many marine habitats, or how dispersal can be limited in what looks like a barrier-less environment. Bob Paine admits as much: directly after the limpet quote, he says “How do you do that with a great white shark or blue whale? There’s this barrier to what I would call natural history.” He goes on to talk about the problems with relying on personal experience when working on systems such as terrestrial forests with very slow dynamics. These long-term, large-scale, hard-to-access systems are, I would argue, exactly where the methods of macroecology and other computational branches of our science come to the fore. It is also, dare I say it, where coordinated observational programmes like NEON can make a real contribution.

But let me finish with perhaps the most important justification for spicing up computer-based ecology with a bit of natural history. We’re supposed to be enjoying ourselves, and for most ecologists surely that means getting out into the field, in whatever capacity - for work or for fun - and wherever it may be, from our back gardens to the back of beyond. My personal view is that doing this whenever you can will make you a better ecologist. But even if I’m wrong, it ought to make you a happier ecologist, and that’s important too.