Preservation and content formats
Abstract
Digital preservation enables seamless access to content in case it becomes unavailable on the publisher’s platform. As this is a major challenge in non-profit open access publishing, several community-led initiatives have been launched to support publishers. To facilitate preservation, interoperability and accessibility, publishers should rely on open standards when choosing file formats.
Main Text
In the pre-digital age, the preservation of publications was ensured through libraries: by legal deposit, or by distributing physical copies to many libraries. Similar approaches are used in the digital age: legal deposit is extended to digital publications, whereas digital preservation services rely on copies preserved on a distributed network of servers, as indicated by the names of two widely used services: Lots of Copies Keep Stuff Safe (LOCKSS) and CLOCKSS (Controlled LOCKSS). LOCKSS and CLOCKSS, as well as another notable service, Portico, are so-called ‘dark archives’, backup databases of born-digital or digitised content, hosted on multiple secured servers, which preserve the authentic original version of the content. In case the content becomes unavailable on the publisher’s platform, these preservation services can deliver it to users, ensuring seamless access (Digital Preservation Coalition, 2015; Shah & Gul, 2019).
Although preservation should have a high priority in scholarly publishing, there are still many publishers who fail to ensure digital preservation for their outputs (Laakso, Matthias, & Jahn 2021). One of the reasons for this are the costs of the service. Several recent initiatives provide support to open access publishers, who are particularly vulnerable when it comes to digital preservation: the PKP Preservation Network (Sprout & Jordan, 2018) offers support to publishers using Open Journal Systems software, project JASPER (JournAlS are Preserved forevER) focuses on Diamond OA journals indexed in the Directory of Open Access Journals, and the Thoth Archiving Network is a community initiative that seeks to help small and scholar-led book publishers (Cole, Barnes & Steiner, 2023).
One of the functionalities of digital preservation services is that they can ensure format conversion, in case the file formats used by the publisher become obsolete. This is one of the reasons why publishers should use open and well-documented file formats with a solid user base, which are more likely to get migration support in case of obsolescence.
Most publications are still provided in the camera-ready PDF format. However, this is not convenient for reading on small screens, nor is it optimal for searching and text and data mining. Tagging full-text content in the XML JATS or equivalent (e.g. TEI) format significantly improves machine readability and interoperability. Providing content in multiple digital formats (PDF, HTML, XML, ePub, etc.) improves user experience and content usability on various devices and browsers.
The fonts used on the publishing platform and in the full text should support Unicode and be open-source, suitable for cross-platform use and accessible. Images should be in high resolution, accompanied with descriptive captions, whereas tables should be well-constructed, annotated and easy to read and interpret. Links to data, code, and other research outputs that underlie the publications and are available in external repositories should be provided both on the outputs’ landing pages and in the full text.
Machine readable metadata exposed through metadata exchange protocols and on outputs’ landing pages should be provided in widely used formats that are also suitable for digital preservation (e.g. XML, CSV).
Related Toolsuite Articles
- Software and interoperability
- Metadata
- Open Science Practices
- Equity, Diversity and Inclusion, Inclusion and Belonging (EDIB)
- Visibility, Indexation, Communication, Marketing and Impact
Related Guidelines
- Metadata formats and export, identifiers, CRediT tags, bibliographic references, JATS XML or equivalent
Related Training Materials
References
- CLOCKSS (Controlled LOCKSS). https://clockss.org/
- Cole, G., Barnes, M., & Steiner, T. (2023). ‘Thoth archiving network: supporting small and Scholar-Led Publishers with repository-led preservation of OA books’. Septentrio Conference Series (1). https://doi.org/10.7557/5.7140
- Digital Preservation Coalition (2015). ‘E-Journals’. In: Digital Preservation Handbook, 2nd Edition. https://www.dpconline.org/handbook/content-specific-preservation/e-journals.
- DOAJ. https://doaj.org/
- DOAJ. (2024). Project JASPER. https://doaj.org/preservation/
- Laakso, M., Matthias, L., & Jahn, N. (2021). Open is not forever: a study of vanished open access journals. Journal of the Association for Information Science and Technology 72(9): 1099–1112. https://doi.org/10.1002/asi.24460.
- Open Journals System (OJS). https://pkp.sfu.ca/software/ojs/
- Portico. https://www.portico.org/
- PKP. (n.d.) PKP Preservation Network. https://docs.pkp.sfu.ca/pkp-pn/en/
- Shah, U.U., & Gul, S. (2019). LOCKSS, CLOCKSS and PORTICO: a look into digital preservation policies. Library philosophy and practice. 2481. https://digitalcommons.unl.edu/libphilprac/2481.
- Sprout, B., & Jordan, M.. (2018). ‘Distributed digital preservation: preserving Open Journal Systems content in the PKP PN’. Digital library perspectives. 34(4): 246–61. https://doi.org/10.1108/DLP-11-2017-0043. A postprint is available in OA: http://hdl.handle.net/2429/70055
- Lots Of Copies Keep Stuff Safe (LOCKSS). https://www.lockss.org/
Further reading
- Armengou, C., Edig, X. van ., Laakso, M., & Umerle, T. (2023). CRAFT-OA Deliverable 3.1 report on standards for best publishing practices and basic technical requirements in the light of FAIR principles (Draft). Zenodo. https://doi.org/10.5281/zenodo.8112662
- Barnes, M., Cole, G., Fry, J., Gatti, R., & Higman, R. (2023). 'Good, Better, Best': practices in archiving and preserving open access monographs (1.0). Zenodo. https://doi.org/10.5281/zenodo.7876048
- Barthonnat, C., Blotière, E., Gingold, A., Mas, F.-X., Stanić, N., Pierno, A., Szulińska, A., Armando, L., Pochet, B., de Santis, L., MacGregor, J., Pozzo, R., & Pogačnik, A. (2021). OPERAS SIG on Tools for Open Scholarly Communication: White Paper 2021. Zenodo. https://doi.org/10.5281/zenodo.5654319
Glossary
Frequently Asked Questions
Licensing
This document is licensed under a Creative Commons Attribution 4.0 International License