Metadata
Abstract
Metadata are crucial for the discoverability and dissemination of published outputs. To make sure that their content will be indexed by search engines, aggregators and other services, and that it will reach the intended audience, publishers and service providers should follow a set of standards and guidelines determining how metadata should be structured, curated and distributed.
Main Text
In the context of open access scholarly publishing, metadata are digital pieces of information that describe published outputs (articles, books, journals, etc.). Metadata are normally structured according to a metadata model that relies on a metadata standard. This ensures that the information provided is meaningful for both humans and machines and sufficient for an unambiguous identification of published outputs. (Avanço, 2023a; 2023b)
The most common metadata for published outputs include:
- Title(s) (of an article / contribution and of the source publication).
- Full names and institutional affiliations of authors.
- Abstracts.
- Keywords (controlled or free-text).
- Publisher name.
- Publication date.
- International standard numbers (ISSN, eISSN, ISBN, ISMN etc.).
- Persistent identifiers for the publication (DOI), authors and contributors (ORCID), author affiliations (ROR), and funding organisations (ROR), as well as other relevant persistent identifiers.
Metadata can also include author roles (e.g. according to the CRediT taxonomy), funding information (e.g. the name of the funder and the grant ID), copyright and licensing information, conflict-of-interest statements and bibliographic references. In case of journal articles or book chapters, metadata can include volume or issue information and pagination.
Standard numbers and persistent identifiers (PIDs) for publications are particularly helpful in identifying published outputs because they are registered in curated registries, along with other metadata describing a publication. Thanks to this, it is possible to retrieve other relevant metadata based on standard numbers or PIDs. For example, if a DOI is known, it will be possible to retrieve the type of publication, its title, publisher and the publication date, ISSN and journal title, if this is a journal article, etc.
When displaying metadata on online publishing platforms, publishers should make it easy for users to find relevant information about a specific output (e.g. journal article) on a single page, without having to search for it elsewhere. To achieve this, following general search engines such as Google’s recommendations, each published item (article, chapter, book, etc.) should have a dedicated landing page (with a unique URL) showing the above-mentioned metadata and a link to the full text.
Along with making this information available to human users, publishers should ensure that it is also accessible to various search engines and aggregators, which can increase the visibility and use of the published content. Search engines and aggregators require that metadata be exposed in a specific format (e.g. XML, JSON, HTML, CSV) via an appropriate metadata exchange protocol (Open Access Initiative Protocol for Metadata Harvesting – OAI-PMH, REST API, HTTPS, etc.). Furthermore, machine-readable metadata are also relevant for human users who want to export them for analysis or import them in reference managers. Metadata should also be included in the full text in human and machine-readable formats (e.g. embedded in PDF or in JATS XML).
Setting up machine-readable metadata requires technical expertise but, fortunately, publishers do not have to do everything from scratch, as free and open-source software solutions are available. Platforms such as Open Journal Systems or Janeway come with ready-made solutions for displaying metadata on landing pages for humans and exposing them for machines in several formats. For books, a similar approach is being implemented by the open-source and community-led metadata management system Thoth Open Metadata.
Apart from ensuring that technical requirements for metadata sharing are met, publishers should also enable the seamless flow of metadata by releasing them in the public domain (e.g. by using the Creative Commons CC0 Public Domain Dedication licence). By doing this, they enable various aggregators to harvest and disseminate their metadata without having to ask permission or deal with complicated licensing issues. Public-domain metadata play the key role in building non-commercial discovery platforms (e.g. OpenAIRE Explore, GoTriple, OpenAlex) and citation indexes (Peroni & Shotton, 2020). This is yet another reason why publishers should always deposit complete metadata about publications, including bibliographic references, with a registration agency (e.g. CrossRef, DataCite) in line with the recommendations of the Initiative for Open Citations (I4OC) and Initiative for Open Abstracts (I4OA).
Related Toolsuite Articles
- Software and interoperability
- Content formats and preservation
- Open Science Practices
- Visibility, Indexation, Communication, Marketing and Impact
Related Guidelines
Related Training Materials
References
- Avanço, K. (2023a). What is metadata for publication and how is it used? Part 1: introduction to metadata’. Billet. The Road to FAIR (blog). https://roadtofair.hypotheses.org/499
- Avanço, K. (2023b). What is metadata for publication and how is it used? Part 2: metadata standards’. Billet. The Road to FAIR (blog). https://roadtofair.hypotheses.org/696
- Crossref Metadata Search. https://search.crossref.org/
- Datacite. https://schema.datacite.org/
- DOI Foundation. Digital Object Identifier (DOI). https://www.doi.org/
- GoTriple. https://www.gotriple.eu/
- Initiative for Open Citations (I4OC). https://i4oc.org/
- Initiative for Open Abstracts (I4OA). https://i4oa.org/
- Janeway. https://janeway.systems/
- NISO. (n.d.). Contributor Roles Taxonomy (CRediT). https://credit.niso.org/
- NISO. (n.d.) Journal Article Tag Suite (JATS XML). https://jats.nlm.nih.gov/index.html
- OpenAlex. https://openalex.org/
- OpenAIRE explore. https://explore.openaire.eu/
- Open Archives Initiative. (n.d.). Protocol for Metadata Harvesting. https://www.openarchives.org/pmh/
- Open Journal Systems (OJS). https://pkp.sfu.ca/software/ojs/
- ORCID. https://orcid.org/
- Peroni, S., & David S. (2020). ‘OpenCitations, an infrastructure organization for open scholarship’. Quantitative Science Studies 1(1): 428–44. https://doi.org/10.1162/qss_a_00023
- Research Organization Registry (ROR). https://ror.org/
- Thoth Open Metadata. https://thoth.pub
Further reading
- Edmunds, J. (2023). Metadata and Libraries. https://openaccessbooksnetwork.hcommons.org/2023/11/16/open-metadata-and-libraries/
- van Gerven Oei, V. W. J. (2020). Open metadata in Thoth. https://doi.org/10.21428/785a6451.eb0d86e8
Glossary
Frequently Asked Questions
- What are the basic functionalities that a publishing infrastructure should have?
- How can I assign persistent identifiers to published content?
- What are the common metadata formats used for exporting publication metadata?
- What are the standard protocols for retrieving metadata from publishing infrastructures?
- What are the best practices for documenting and preserving content and metadata over time?
Licensing
This document is licensed under a Creative Commons Attribution 4.0 International License