barcoding, citizen science, species interactions) that can be linked to species occurrence records will increase. Such efforts unlock previously inaccessible data and expand their availability to researchers around the world. Validation, In general, taxonomy papers only check issues related to nomenclature and identification. The overall goal of this paper is to determine how researchers use open-access data in published work, focusing on the past decade, when growth of online biodiversity databases has been most rapid. Data papers and papers describing a new database have increased over time (Fig 2), which is likely to be the result of the introduction and expansion of many data journals [69,85], online platforms for reporting species occurrence observations such as iNaturalist [86] and eBird [3,87], and efforts over the past decade to digitize specimen records [1,13]. We identify the most commonly cited databases and most-studied taxa, number of taxa addressed, most common research uses, the types of data most often linked to species occurrence records, and aspects of data quality addressed in these papers. Several direct non-target effects on beneficial and native organisms by GMOS have been reported. environmental variables, and 4.) Investigation, Content Guidelines 2. Papers with the highest mean number of citations per year involved more applied studies in disease ecology (mean = 18, SD = 33), public health (mean = 8, SD = 7), documenting extinctions (mean = 7, SD = 7), developing a new analytical method to deal with species occurrence data (mean = 7, SD = 8), and citizen science (mean = 7, SD = 6; Table 2). Of the 6.2 million catalogued molluscan lots in U.S. and Canadian collections, 4.5 million have undergone some form of data digitization. The most common data quality issues addressed will be checks for correct taxonomic nomenclature and georeferences, which can often be assessed with readily-available online resources. Investigation, Data contributors who have submitted data to aggregators are not getting credit for the significant work spent on data management, standardization, and quality control. We also characterize studies that exclude certain inappropriate records, remove records with high georeferencing uncertainty, remove outliers, and those that address collection effortsee S1 Table. A higher percentage of data papers, taxonomy, and barcoding papers involved invertebrates (Fig 4), reflecting in part the high taxonomic diversity for this group and need for more data. We then randomly sorted papers into four separate sets of around 500 to allow subsampling of the dataset. The full dataset is published and openly accessible [58]. Yes Data curation, Stakeholders have invested considerable resources to contribute to online databases of species occurrences. https://doi.org/10.1371/journal.pone.0215794.t005. The terms included: species occurrence database (8,800 results), natural history collection database (634 results), herbarium database (16,500 results), biodiversity database (3,350 results), primary biodiversity data database (483 results), museum collection database (4,480 results), digital accessible information database (10 results), and digital accessible knowledge database (52 results)note that quotations are used as part of the search terms where specific phrases are needed in whole. [32])a task that is labor-intensive [33]. We characterize a variety of ways in which researchers are using species occurrence records by assessing the prevalence of individual tags corresponding to topics of interest. We found that the most common uses of online biodiversity databases have been to estimate species distribution and richness, to outline data compilation and publication, and to assist in developing species checklists or describing new species. https://doi.org/10.1371/journal.pone.0215794.g007. Biotechnological methods lead to the identification of a plant material for an important pharmaceutical use. Newer statistical and modeling approaches to deal with biases in biodiversity data have also been developed [43,48,53,54]. Many records are also prone to missing important information or information loss over time, particularly the absence of geographic coordinates and associated uncertainty estimates [31]. The databases used may include specimen and/or observation-based records from biodiversity data aggregators, online natural history collection databases, websites devoted to capturing citizen science observation records, or newly compiled data that are made available in online databases. Furthermore, neutral theory makes less intuitive assumptions than niche theory and does not consider trophic interactions. We further categorized taxa addressed in each paper by adding one or more tag(s) for more specific taxonomic classifications (e.g. For example, micro propagation and the consequent production of identical clones discourage perpetuation of genetic diversity through evolutionary adaptations. Discover a faster, simpler path to publishing in a high-quality journal. Once a country attains the capacity to manage its genetic resources, it will automatically enable it to produce novel products from its own biodiversity. Biodiversity theories can inform important conservation actions such as assessments of species richness and extinction or habitat loss and fragmentation. Marco Sciaini analyzed the data, authored or reviewed drafts of the paper, approved the final draft, conducted systematic literature search. https://doi.org/10.1371/journal.pone.0215794.g005, https://doi.org/10.1371/journal.pone.0215794.t004. Environmental data used in conjunction with online biodiversity records are often applied in studies of species distribution. Data link tags fall under four general categories of data types, including 1.) https://doi.org/10.1371/journal.pone.0215794.g008, https://doi.org/10.1371/journal.pone.0215794.t006. data papers, n = 117), taxonomy (n = 95), conservation (n = 68), data quality (n = 68), invasive species (n = 61), and that described a new database (n = 60, Fig 1); see S1 Table for full descriptions of each category of research use. preference for endangered species, charismatic taxa, avoiding common species or pests [47]), and environmental bias (e.g. Yes [75]). PLOS ONE promises fair, rigorous peer review, broad scope, and wide readership a perfect fit for your research every time. The most commonly studied taxa were plants (n = 232 papers, 46%), followed by invertebrates (n = 125, 25%), vertebrates (n = 124, 25%), all taxa (n = 40, 8%), fungi (n = 16, 3%), and paleontological specimens (n = 14, 3%; Table 3). No, PLOS is a nonprofit 501(c)(3) corporation, #C2354500, based in San Francisco, California, US, Corrections, Expressions of Concern, and Retractions, https://doi.org/10.1371/journal.pone.0215794, https://zenodo.org/record/2589439#.XKfWOutKjBI, https://journals.ku.edu/index.php/jbi/article/view/4126, https://www.scienceopen.com/document?vid=dc419213-0ca3-48cc-901c-2934ecf4441e, http://www.fishesoftexas.org/documentation/. The increasingly available collections, genetic, and phylogenetic data are highly relevant in taxonomy-related studies and data papers, which increased over time (Fig 2). https://doi.org/10.1371/journal.pone.0215794.g001, https://doi.org/10.1371/journal.pone.0215794.g002. https://doi.org/10.1371/journal.pone.0215794.t001. The funders had no role in the study design. Adverse biological effects on non-target populations and ecological and evolutionary disruption may be either the direct result of the introduced transgene(s) or alternatively the indirect result of socioeconomic conditions related to the application of recombinant DNA technologies. Our dataset includes 165 papers that involve compiling and publishing data online (117 data papers and 60 papers that describe a new database, some of these papers overlap). No, Is the Subject Area "Database searching" applicable to this article? There are around 60,000 species of vertebrates, an estimated 400,000 plants, and an estimated 56 million species of insects; about one million insect species are currently described, which highlights the need for more taxonomic work in this group [20,94]. However, it is possible that many studies simply use available data and may not appropriately evaluate data quality. Moreover, some models based on neutral theory subdivide space into local community and metacommunity, which reflects concepts commonly used in conservation science. More journals accept papers or even focus on publishing high-quality data and recognize this as an important part of the scientific process [74,84,88,89]. However, only a subset of these have uncertainty radii associated. Further, large-scale phylogenetic resources, such as Open Tree of Life [115] that launched in 2015, have made it easier than ever before to phylogenies with other species data. As one illustration of that growth, the Global Biodiversity Information Facility (GBIF) has grown from provisioning just over 200 million records in 2010 to over 1.08 billion records today, a greater than fivefold increase [10]. The third major topic for this work was to determine how often different taxonomic groups are represented in papers utilizing biodiversity databases. These cookies will be stored in your browser only with your consent. Project administration, Data quality papers tend to focus evenly on the two most easily corrected issues (spatial and taxonomic, each 40% of data quality papers), followed by accounting for spatial bias (29% of data quality papers), effort (25%), and correcting specimen identification (18%). This may be one limiting factor holding back studies that utilize all data currently held within biodiversity databases and studies that address very large numbers of taxa within clades. Taxonomic nomenclature was the most commonly checked data quality issue for all other top uses, ranging from 40% of papers (conservation and data quality uses) to 56% (taxonomy). Finally, we have a detection tag to represent use of statistical methods to estimate detection probability [53]. Share Your PDF File One major problem is that many papers using biodiversity data have obtained data from an aggregator, such as GBIF, which has potentially drawn from thousands of original data sources. Other forms of bias were rarely addressed in only 12% of papers and include temporal bias (usually seasonal bias for certain times of year, or bias for certain years where specialists are active), taxonomic bias (e.g. This cookie is set by GDPR Cookie Consent plugin. However, we are also utilizing the dataset of tagged papers to address additional questions regarding author connectedness and collaboration across institutions, countries, and disciplines. You also have the option to opt-out of these cookies. While vertebrates have more data, they are by no means complete [102]; less-studied vertebrates (i.e. Roles Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. Our goals here were to characterize the most commonly studied taxonomic groups, the number of taxa addressed, and to determine uses associated with the three most common organismal groupings (plants, vertebrates, and invertebrates). Data quality improvements on a large scale will require additional investment in data enhancements (e.g. Department of Environmental Biology, Universidad de Navarra, Pamplona, Spain, Roles Copyright: 2019 Ball-Damerow et al. The most common data uses associated with the major taxonomic groups reflect the general maturity of data products associated with the respective group. In contrast, highly diverse invertebrates are more likely to be the subject of foundational biodiversity studies, such as taxonomy, barcoding, and data papers. (2019) Research applications of primary biodiversity databases in the digital age. Given the speed of taxonomic concept changes [126], lack of updated resources is a significant impediment to proper data integration. Up to this point, researchers have most often cited GBIF in this case (usually in-text, not in the reference section) and neglect to credit original data sources [77]. https://doi.org/10.1371/journal.pone.0215794.t002. Writing review & editing, Affiliation We searched for papers that use online and openly accessible primary occurrence records or add data to an online database. The biodiversity community is still in an active stage of compiling existing biodiversity data and dealing with issues of data quality. However data are published, primary biodiversity data should also be integrated into an aggregate system with similar data, such as GBIF, OBIS, VertNet, iDigBio, or BOLDSystems [74]. However, models based on neutral theory proved to be useful in some biodiversity hotspots. Applications of biodiversity theories in conservation, Department of Ecosystem Modelling, Georg-August Universitt Gttingen, This is an open access article distributed under the terms of the, https://doi.org/10.7287/peerj.preprints.27054v1. https://doi.org/10.1371/journal.pone.0215794.g004. This research was supported in part through a Bass Postdoctoral fellowship to J. Ball-Damerow at the Field Museum of Natural History (Chicago, USA), under the mentorship of P. Sierwald and R. Bieler, and by the Negaunee Foundation. competitive, consumptive, symbiotic, or pathogenic relationships), and 30 studies overall involved species interactions. Three of the top five data types linked to online occurrence records included other types of occurrence dataliterature-based occurrence data, surveys, and specimen data from natural history collections (n = 189, n = 145, and n = 135 papers used these data types, respectively). Continued efforts in data preservation and promoting best practices in data citation are essential for advancing scientific reproducibility, sustaining data resources, and encouraging publication of high-quality biodiversity data. We assess the average number of quality tags associated with papers overall, and the most common data quality issues addressed within each of the top uses. Data curation, Both taxonomy and data papers used collection data most frequently in addition to data already available in online databases. Large species often receive more research and conservation funding, and very few conservation assessments exist for invertebrate taxa; most insect species are classified as data deficient (e.g. It is possible that people use data for these purposes, but do not necessarily publish papers on the topic or may not cite databases for this work [84]. The combined data of massive authority file efforts spanning multiple taxon groups, such as those covered by WoRMS, allow for novel approaches to data analysis [127]. This cookie is set by GDPR Cookie Consent plugin. Improving upon automated solutions to flag errors, and efficient mechanisms to report and correct data quality issues is critical in advancing the relevance and broadest use of this type of biodiversity data [. Share Your Word File Conservation-focused studies most often linked occurrence records to conservation status, habitat, literature, and climatic data. Some expected trends include the following: We identify 347 primary biodiversity databases used in papers from our dataset (S2 Table), the URL for each database, and the scale (institution, regional, global, taxa) and regional or taxonomic focus (e.g. Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet. https://doi.org/10.1371/journal.pone.0215794.s003. Some examples of variables contributing to bias include socioeconomic factors [44,45], the exclusion of common species over rare and flashy ones [4648], the selection of large and attractive specimens [49], seasonal bias [50], problematic distinction between living and dead-collected specimens and associated post-mortem transportation [51,52], and discarding worn specimens, which results in phenological bias or elimination of specimens with signs of disease [8]. Methodology, Some exceptions were that a relatively large number of survey respondents claimed that they use biodiversity data for ecology/evolution studies, natural resources management, life history/phenology studies, and education/outreach, but relatively few published studies used occurrence data for these purposes in our dataset. Data quality issues are often dictated by the specific use. Many taxa and regions are still highly under-sampled or completely unrepresented (e.g. Online taxonomic catalogues and tools to check records against updated catalogues are available for correcting taxonomic nomenclature [118,119]. Applications of these primary biodiversity data are variedsuch data have historically helped determine harmful effects of pesticides, document spread of infectious disease and invasive species, monitor environmental change, and much more [49]. The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional". Privacy Policy3. Yes At the same time, new approaches to publishing and integrating previously disconnected data resources promise to help provide the evidence needed for more efficient and effective conservation and management. Birds in particular have relatively good data available, in part because of online citizen science efforts and associated open data platforms, such as eBird [3]. Methodology, Roles The overall prevalence of plants in this work corroborated a recent bibliometric study, which found that 56% of biodiversity-related papers addressed plants, compared to 29% for vertebrates and 23% for invertebrates [90]. Specific environmental parameters used to predict distribution should be informed by expert knowledge of the requirements of a given species. Conceptualization, The digitization of natural history specimens [1,2] and development of online platforms for citizen science [3] have driven a steady accumulation of species occurrence records over the past decade. The best answers are voted up and rise to the top. The authors declare that they have no competing interests. Analytical cookies are used to understand how visitors interact with the website. Can cockroaches be fused together with their Brain Juice? The prevalence of most uses did not change from 20102016, with the exception of data papers and taxonomy-related studies, which both increased (Fig 2); taxonomy studies usually involved developing regional species checklists. here. What data quality issues tend to be addressed for the top uses? Data types fall within one of four categories, including 1.) While several previous studies have reviewed uses of natural history collections data [4,6,8,55], and one study has analyzed field-specific usage for the GBIF index [56], to our knowledge no other study has quantitatively reviewed trends in how species occurrence databases are utilized in published research. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. After removing duplicates across search terms, the final database included 2,460 papers. Validation, Taxonomy-related uses of online species occurrence databases sometimes involve describing new species, but more commonly involve compilation of regional species checklists. Even with correct identification, names in species occurrence repositories may still be incorrect and need validation [36]. Taxonomic nomenclature, species identification, spatial, and temporal data quality tags represent adjustments to the dataset used in a study that at least partially corrects the associated errors (see S1 Table). Indirect impacts of biotechnology are immense and of very great relevance to people in developing countries who rely directly on biodiversity for their sustenance. Novel and integrative applications are restricted to certain taxonomic groups and regions with higher numbers of quality records. Specimen images, while not always useful for diagnosis, can often helpparticularly when they meet the criteria for taxonomic-grade imaging. In addition, we determine prevalence of these tags over time to assess positive or negative trends. https://doi.org/10.1371/journal.pone.0215794.t007. Data curation, Biodiversity theories are not very often explicitly consulted in conservation practice, but implicitly many conservation decisions rely on theory. https://doi.org/10.1371/journal.pone.0215794.g003, https://doi.org/10.1371/journal.pone.0215794.t003. Sometimes the compiled data eventually make it into online data aggregators, such as GBIF, and sometimes they do not. Most papers focused on numbers of species in the single or double digits (Table 4). How often are major data quality issues addressed? The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. Only 69% of papers in our dataset addressed one or more aspects of data quality, which is low considering common errors and biases known to exist in opportunistic datasets.