Ciência & Cultura — What is Software Heritage, and why does UNESCO support it since 2017 as a key infrastructure to preserve software as part of the world’s heritage?
Roberto di Cosmo — Software is a fundamental pillar of modern knowledge. It drives scientific discoveries, powers critical infrastructure, and shapes technological evolution. Yet, unlike scientific articles and datasets, software remains at risk of disappearing due to obsolescence, lack of proper archiving, or organizational changes. Software Heritage was created to address this challenge. It is a universal, long-term archive for software source code, ensuring that all publicly available code is collected, preserved, and made accessible for future generations. By providing an open, replicated and standardized infrastructure, Software Heritage safeguards not only the code itself but also the knowledge embedded within it. Since it started operating, Software Heritage has collected over 340 million software projects, totaling over 22 billion unique files. Recognizing the importance of software in the scientific and cultural ecosystem, UNESCO has partnered with Software Heritage as part of its global strategy for preserving digital knowledge. Mr. Tawfil Jelassi, Assistant Director-General, UNESCO Communication and Information Sector, put this nicely in his address to the 2023 Annual Software Heritage Symposium :
“Software source code represents unique knowledge of humanity’s recent history. It is crucial to work collectively so that the knowledge embedded in software source code is properly preserved, valued, and shared with all”.
UNESCO Recommendation on Open Science explicitly highlights the importance of open-source software as a key component of open scientific knowledge, that clearly states:
“Open scientific knowledge refers to open access to scientific publications, research data, metadata, open educational resources, software, and source code and hardware that are available in the public domain or under copyright and licensed under an open license that allows access, re-use, repurpose, adaptation and distribution […].”
It also highlights the need to preserve and make available not only the code that a researcher writes, but also all its dependencies, that are crucial for reproducibility:
“In the context of open science, when open source code is a component of a research process, enabling reuse and replication generally requires that it be accompanied by open data and open specifications of the environment required to compile and run it.”
This aligns perfectly with Software Heritage’s mission to preserve all publicly available software source code — including its dependencies — supporting the goal of making research results reproducible, verifiable, and reusable over time.
Figure 2. Chart shows that in January 2025 the Software Heritage archive contained over 22.5 billion source code files, almost 5 billion commits, corresponding to 345 million projects organized in almost 18 billion directories, 86 million software authors
(Source: Software Heritage. Reproduction)
C&C — How can we increase the perception, among scientists, of software as a first-class citizen in the Open Science ecosystem?
RC — To fully embrace Open Science, the research community needs to acknowledge software as a core research output, just like articles and datasets. To foster this increased recognition, there are several concrete actions that can be taken:
- Archiving and preserving research software systematically to ensure accessibility and reproducibility; Software Heritage makes this process extremely straightforward, providing browser extensions that enable archival in one click, and webhook integration with most popular code hosting platforms like GitHub, GitLab, Gitea, or Bitbucket.
- Citing software properly using intrinsic, persistent identifiers (SWHIDs) that allow precise referencing at different levels (files, revisions, releases). Software Heritage recently unveiled a “citation” functionality that produces exact citation information for all software that contains appropriate metadata.
- Encouraging academic recognition of software contributions, by establishing awards for exemplary research software projects. This approach has been pioneered in France since 2022, with a national open science award for software that is described in detail in the Blanc Catala et al. article.
- Integrating software in research workflows by making its use, modification, and evolution traceable.
Software Heritage plays a key role in this transformation. It collaborates with major scholarly infrastructures, allowing seamless archiving and metadata submission of research software. Leading publishers and repositories such as Zenodo, HAL, eLife, Episciences, and Dagstuhl already integrate Software Heritage into their workflows, ensuring that research software is permanently preserved and properly referenced. Moreover, initiatives like ReplicabilityStamp.org for Computer Graphics leverage Software Heritage to ensure computational reproducibility, reinforcing the role of software as a foundational element in research. By embedding software preservation and citation practices into the scholarly ecosystem, we can elevate software to a recognized and valued research output.
“By providing an open, replicated and standardized infrastructure, Software Heritage safeguards not only the code itself but also the knowledge embedded within it.”
C&C — How has Software Heritage contributed to advancing this perception? How can Brazilian research institutions participate in this major initiative?
RC — Software Heritage actively collaborates with universities, research institutions, and infrastructures worldwide to ensure that software is recognized, archived, and properly cited. Key contributions include:
- Providing an Open Archive: Software Heritage is a proactive harvester of publicly available code. Many researchers may already find their software preserved in the archive, even if they were unaware of it. The archive ensures that research software remains accessible without requiring complex registration mechanisms.
- Enabling Software Citation & Reproducibility: The Software Hash Identifier (SWHID) enables researchers to reference their code with persistent, verifiable, and ISO-standardized identifiers. These identifiers are already in use by major publishers, reinforcing their role in scientific reproducibility.
- Integrating with Key Research Infrastructures: Platforms like Zenodo and HAL now systematically transfer software submissions to Software Heritage, ensuring long-term preservation and citation.
- Launching the Open Science Membership Program: Research institutions, libraries, and universities can actively support the archive while gaining access to training, collaboration opportunities, and advanced functionalities. Brazilian institutions have an opportunity to join this network, helping shape global best practices in research software preservation.
“By participating in Software Heritage, Brazil can lead the way in Open Science and digital knowledge preservation, ensuring that its research software remains a valuable and accessible resource for future generations.”
C&C — How Brazilian Research Institutions can get involved
RC — Brazilian universities and research centers can contribute to this movement by:
- Ensuring that research software is systematically archived in Software Heritage.
- Encouraging researchers to use SWHIDs to reference their software, improving recognition and traceability.
- Collaborating on Open Science initiatives by integrating Software Heritage into national research infrastructures and funding programs.
- Joining the Software Heritage Open Science Membership Program, which provides direct engagement opportunities with a global network of institutions working on research software, through the Archives and Libraries Interest Group.
“Software Heritage is not just an archive—it is a global infrastructure for Open Science.”
By participating in Software Heritage, Brazil can lead the way in Open Science and digital knowledge preservation, ensuring that its research software remains a valuable and accessible resource for future generations. Software Heritage is not just an archive—it is a global infrastructure for Open Science. Its integration with major publishers, research repositories, and scholarly initiatives makes it an essential tool for preserving, referencing, and recognizing research software. Brazilian researchers and institutions now have a unique opportunity to leverage Software Heritage to strengthen Open Science and ensure that software remains a trusted and accessible component of scientific knowledge.

Figure 2. Roberto di Cosmo
(Photo: https://dicosmo.org. Reproduction)