Create your own conference schedule! Click here for full instructions

Abstract Detail

Systematics Section/ASPT

Gottschalk, Stephen [1], Watson, Kimberly [1], Tulig, Melissa [1], Thiers, Barbara [1].

Developing a Semi-Automated Workflow for Specimen Record Completion.

The New York Botanical Garden Herbarium has more than 200 years worth of plant collections from all over the globe, amounting to over 7 million herbarium specimens. Optical character recognition (OCR) software has increased the rate at which specimen label data is captured and much progress has been made on how best to incorporate OCR-generated text into a label transcription workflow. However, given the diversity of collectors, collection label types, languages, etc. represented in the NYBG collection, a fully-automated “one size fits all” approach to specimen data capture through OCR and Natural Language Processing is unlikely. Instead the focus is on grouping label images based on the OCR text, enabling rapid data capture of key label data elements (e.g. collection number, collector, country). Records are then completed from grouped sets of label images with corresponding OCR text, leveraging where possible any digitized collector field book records and existing complete data from all available sources (e.g. GBIF, project partners).  Furthermore, this grouping allows records to be siphoned off for various methods of completion, including crowd sourcing legible labels to citizen scientists, sending difficult labels to a specialist, and targeting fully typed labels for natural language processing. Further integration and automation of these methods will lead to more efficient data extraction from physical herbarium specimens.

Log in to add this item to your schedule

1 - New York Botanical Garden, William and Lynda Steere Herbarium, 2900 Southern Blvd., Bronx, New York, 10458, United States

specimen digitization
data management
data analysis

Presentation Type: Oral Paper:Papers for Sections
Session: 4
Location: Payette/Boise Centre
Date: Monday, July 28th, 2014
Time: 9:00 AM
Number: 4005
Abstract ID:418
Candidate for Awards:None

Copyright 2000-2013, Botanical Society of America. All rights reserved