Skip to main content

Metadata

Abstract

Metadata are crucial for the discoverability and dissemination of published outputs. To make sure that their content will be indexed by search engines, aggregators and other services, and that it will reach the intended audience, publishers and service providers should follow a set of standards and guidelines determining how metadata should be structured, curated and distributed.

Main Text

In the context of open access scholarly publishing, metadata are digital pieces of information that describe published outputs (articles, books, journals, etc.). Metadata are normally structured according to a metadata model that relies on a metadata standard. This ensures that the information provided is meaningful for both humans and machines and sufficient for an unambiguous identification of published outputs. (Avanço, 2023a; 2023b)

The most common metadata for published outputs include:

  • Title(s) (of an article / contribution and of the source publication).
  • Full names and institutional affiliations of authors.
  • Abstracts.
  • Keywords (controlled or free-text).
  • Publisher name.
  • Publication date.
  • International standard numbers (ISSN, eISSN, ISBN, ISMN etc.).
  • Persistent identifiers for the publication (DOI), authors and contributors (ORCID), author affiliations (ROR), and funding organisations (ROR), as well as other relevant persistent identifiers.


Metadata can also include author roles (e.g. according to the CRediT taxonomy), funding information (e.g. the name of the funder and the grant ID), copyright and licensing information, conflict-of-interest statements and bibliographic references. In case of journal articles or book chapters, metadata can include volume or issue information and pagination.

Standard numbers and persistent identifiers (PIDs) for publications are particularly helpful in identifying published outputs because they are registered in curated registries, along with other metadata describing a publication. Thanks to this, it is possible to retrieve other relevant metadata based on standard numbers or PIDs. For example, if a DOI is known, it will be possible to retrieve the type of publication, its title, publisher and the publication date, ISSN and journal title, if this is a journal article, etc.

When displaying metadata on online publishing platforms, publishers should make it easy for users to find relevant information about a specific output (e.g. journal article) on a single page, without having to search for it elsewhere. To achieve this, following general search engines such as Google’s recommendations, each published item (article, chapter, book, etc.) should have a dedicated landing page (with a unique URL) showing the above-mentioned metadata and a link to the full text. 

Along with making this information available to human users, publishers should ensure that it is also accessible to various search engines and aggregators, which can increase the visibility and use of the published content. Search engines and aggregators require that metadata be exposed in a specific format (e.g. XML, JSON, HTML, CSV) via an appropriate metadata exchange protocol (Open Access Initiative Protocol for Metadata Harvesting – OAI-PMH, REST API, HTTPS, etc.). Furthermore, machine-readable metadata are also relevant for human users who want to export them for analysis or import them in reference managers. Metadata should also be included in the full text in human and machine-readable formats (e.g. embedded in PDF or in JATS XML). 

Setting up machine-readable metadata requires technical expertise but, fortunately, publishers do not have to do everything from scratch, as free and open-source software solutions are available. Platforms such as Open Journal Systems or Janeway come with ready-made solutions for displaying metadata on landing pages for humans and exposing them for machines in several formats. For books, a similar approach is being implemented by the open-source and community-led metadata management system Thoth Open Metadata.

Apart from ensuring that technical requirements for metadata sharing are met, publishers should also enable the seamless flow of metadata by releasing them in the public domain (e.g. by using the Creative Commons CC0 Public Domain Dedication licence). By doing this, they enable various aggregators to harvest and disseminate their metadata without having to ask permission or deal with complicated licensing issues. Public-domain metadata play the key role in building non-commercial discovery platforms (e.g. OpenAIRE Explore, GoTriple, OpenAlex) and citation indexes (Peroni & Shotton, 2020). This is yet another reason why publishers should always deposit complete metadata about publications, including bibliographic references, with a registration agency (e.g. CrossRef, DataCite) in line with the recommendations of the Initiative for Open Citations (I4OC) and  Initiative for Open Abstracts (I4OA).


Related Toolsuite Articles


Related Guidelines


Related Training Materials

References


Further reading

 

Glossary

Frequently Asked Questions 

Licensing

This document is licensed under a Creative Commons Attribution 4.0 International License


Toolsuite main menu