In lab meeting a couple of years ago, we
discussed whether making government-funded ecology research publicly available
would actually benefit science. The general consensus was that while the public
should have the right to access research its tax dollars have paid for, making
data open would not really benefit them or science. My labmates argued that there is already
too much data and too few people with the knowledge necessary to make meaning from the data. Furthermore, they argued, the frequency with which some grants require data to be made publicly available would require researchers to take time away from science during peak field season in order to enter and upload the data. And I followed along.
However, attitudes are shifting. Recently there has been a
flurry of papers and blog posts on open data and what it means for ecology. For
example, in a really nice article in Frontiers in Ecology and the
Environment, Hampton and colleagues argue that if ecologists are to survive,
they must both share and use shared data.
Yet in a survey, the authors found that less than half of the papers produced
using NSF funds had also published some or all of the data used to write the
paper. As another incentive to "open" data, the authors argue that there are instances - such as when rapid responses to environmental crises are needed - when open data is used more extensively
than what they refer to as "dark data". Thus worries about data
overload and lack of relevance appear to be unfounded; the government needs
bang for its buck, not tree-hugging.
Joern Fischer, a professor at Leuphana
University responded to this paper on his blog, stating that while he believes sharing
is a nice idea, in practice there is no shortage of data, and allowing other
people not intimate with the sites from which the data was collected is
dangerous. Ecology is apparently a touchy-feely science which cannot be reduced
to data points that can be used to look for larger global patterns, a point
which the Hampton paper also brings up.
But I would argue that 1. getting too intimate
with your site is dangerous (you start seeing patterns which aren't there, so
you MAKE them there when you do statistical analyses), and 2. we really just
need more complete metadata, including many pictures of research sites
throughout the seasons. For example, there have been fires in various plots at the Boston Area Climate Experiment, and they have been logged in the online shared lab notebook. However, to my knowledge, this information is only accessible to people working at the site. "Hidden" metadata like this must be made
available to anyone reading papers and using the associated data to
complete a meta-analysis of climate warming effects themselves.
Another point that Joern brings up is that field
ecologists will do the hard work collecting data and have to publish in
smaller, regional, less-prestigious journals while the modelers sit at their
desks, distant from the field, and compile all this data into articles the top
journals are begging for. I have a number of gripes with this statement. First, if you are doing ecology to get
publicity, you are in the wrong field. That applies for all desk-, lab-, and field-bound types. Second, this separation between writers and doers
is ancient - how many techs do biomedical labs have, and yet PIs write the
paper with no input from the technicians about what funky things happened along
the way? Third, having gone from an almost exclusively field-based position to
an almost exclusively computer-based one, I would do anything to be spending my
summer outside looking at nature's pixels; working at a computer is not some
lazy-ass bliss. Nothing is. Fourth, most ecological data collection can be done
by minimally-trained volunteers (Earthwatch actually requires that projects it funds use
volunteer data collectors extensively); I reckon the future of ecology will be
a PI with some model or question they want to ask, going to public data,
identifying a hole, and involving the public to collect that data, and possibly
analyze it. It seems like a grant-writers dream given the current funding
requirements.
So what are we really worried about? The idea of
more work? Being responsible for a broader array of literature? Isn't it our
job to understand the world? Ecologists don't write grants which say "I
want to understand exactly what happens in the four 6m*6m plots I will be
studying", but rather "I will design a study using four 6m*6m plots
superficially representative of the broader environment with the hope of understanding
patterns and processes in ecology which can be extended to larger spatial
scales".
But to scale up in this day and age, we have a responsibility to
not just conjecture, but actually test it. If nobody is asking the same
question (or if it has been asked, but the data has been
analyzed inappropriately), and we only have published results to go on,
how will we do this? We can ask people for their raw data, but emailing busy
professors who have to dig up datasets not necessarily formatted for
sharing is a time-consuming process.
It's time to go beyond the costs of taking the
time now to put your data in a clear format for others (and you a few years
down the line) to access, and to think long-term. That is not to say that I
think all data should be analyzed blindly without respect to site intricacies;
we don't know what factors are important in ecological data, and how they may
differ with time and space. However, looking over larger landscapes allows us
to examine broader patterns and identify best practices for land management in
the absence of finer resolution data, and if the metadata we have does not
predict responses of interest at a broader scale, we have a reason to apply for
more funding to do field work and ask why.
For a field so obsessed with statistics, such
aversion to testing the effect of increasing sample size
seems ridiculous.
For a more positive spin on open data, Chris Lortie of York University has made a pre-print available on the role of open data in meta-analyses which is available here.
No comments:
Post a Comment