C&C 1E25 - entrevista - Software Heritage - capa site

Interview with Roberto di Cosmo

Ciência & Cultura —  What is Software Heritage, and why does UNESCO support it since 2017 as a key infrastructure to preserve software as part of the world’s heritage?

Roberto di Cosmo — Software is a fundamental pillar of modern knowledge. It drives scientific discoveries, powers critical infrastructure, and shapes technological evolution. Yet, unlike scientific articles and datasets, software remains at risk of disappearing due to obsolescence, lack of proper archiving, or organizational changes. Software Heritage was created to address this challenge. It is a universal, long-term archive for software source code, ensuring that all publicly available code is collected, preserved, and made accessible for future generations. By providing an open, replicated and standardized infrastructure, Software Heritage safeguards not only the code itself but also the knowledge embedded within it. Since it started operating, Software Heritage has collected over 340 million software projects, totaling over 22 billion unique files. Recognizing the importance of software in the scientific and cultural ecosystem, UNESCO has partnered with Software Heritage as part of its global strategy for preserving digital knowledge. Mr. Tawfil Jelassi, Assistant Director-General, UNESCO Communication and Information Sector, put this nicely in his address to the 2023 Annual Software Heritage Symposium :

 

Software source code represents unique knowledge of humanity’s recent history. It is crucial to work collectively so that the knowledge embedded in software source code is properly preserved, valued, and shared with all”.

 

UNESCO Recommendation on Open Science explicitly highlights the importance of open-source software as a key component of open scientific knowledge, that clearly states:

“Open scientific knowledge refers to open access to scientific publications, research data, metadata, open educational resources, software, and source code and hardware that are available in the public domain or under copyright and licensed under an open license that allows access, re-use, repurpose, adaptation and distribution  […].”

 

It also highlights the need to preserve and make available not only the code that a researcher writes, but also all its dependencies, that are crucial for reproducibility:

 

In the context of open science, when open source code is a component of a research process, enabling reuse and replication generally requires that it be accompanied by open data and open specifications of the environment required to compile and run it.”

 

This aligns perfectly with Software Heritage’s mission to preserve all publicly available software source code — including its dependencies — supporting the goal of making research results reproducible, verifiable, and reusable over time.

Figure 2. Chart shows that in January 2025 the Software Heritage archive contained over 22.5 billion source code files, almost 5 billion commits, corresponding to 345 million projects organized in almost 18 billion directories, 86 million software authors
(Source: Software Heritage. Reproduction)

 

C&C — How can we increase the perception, among scientists, of software as a first-class citizen in the Open Science ecosystem?

RC — To fully embrace Open Science, the research community needs to acknowledge software as a core research output, just like articles and datasets. To foster this increased recognition, there are several concrete actions that can be taken:

 

  • Archiving and preserving research software systematically to ensure accessibility and reproducibility; Software Heritage makes this process extremely straightforward, providing browser extensions that enable archival in one click, and webhook integration with most popular code hosting platforms like GitHub, GitLab, Gitea, or Bitbucket.
  • Citing software properly using intrinsic, persistent identifiers (SWHIDs) that allow precise referencing at different levels (files, revisions, releases). Software Heritage recently unveiled a “citation” functionality that produces exact citation information for all software that contains appropriate metadata.
  • Encouraging academic recognition of software contributions, by establishing awards for exemplary research software projects. This approach has been pioneered in France since 2022, with a national open science award for software that is described in detail in the Blanc Catala et al. article.
  • Integrating software in research workflows by making its use, modification, and evolution traceable.

 

Software Heritage plays a key role in this transformation. It collaborates with major scholarly infrastructures, allowing seamless archiving and metadata submission of research software. Leading publishers and repositories such as Zenodo, HAL, eLife, Episciences, and Dagstuhl already integrate Software Heritage into their workflows, ensuring that research software is permanently preserved and properly referenced. Moreover, initiatives like ReplicabilityStamp.org for Computer Graphics leverage Software Heritage to ensure computational reproducibility, reinforcing the role of software as a foundational element in research. By embedding software preservation and citation practices into the scholarly ecosystem, we can elevate software to a recognized and valued research output.

 

“By providing an open, replicated and standardized infrastructure, Software Heritage safeguards not only the code itself but also the knowledge embedded within it.”

 

 

C&C — How has Software Heritage contributed to advancing this perception? How can Brazilian research institutions participate in this major initiative?

RC — Software Heritage actively collaborates with universities, research institutions, and infrastructures worldwide to ensure that software is recognized, archived, and properly cited. Key contributions include:

 

  • Providing an Open Archive: Software Heritage is a proactive harvester of publicly available code. Many researchers may already find their software preserved in the archive, even if they were unaware of it. The archive ensures that research software remains accessible without requiring complex registration mechanisms.
  • Enabling Software Citation & Reproducibility: The Software Hash Identifier (SWHID) enables researchers to reference their code with persistent, verifiable, and ISO-standardized identifiers. These identifiers are already in use by major publishers, reinforcing their role in scientific reproducibility.
  • Integrating with Key Research Infrastructures: Platforms like Zenodo and HAL now systematically transfer software submissions to Software Heritage, ensuring long-term preservation and citation.
  • Launching the Open Science Membership Program: Research institutions, libraries, and universities can actively support the archive while gaining access to training, collaboration opportunities, and advanced functionalities. Brazilian institutions have an opportunity to join this network, helping shape global best practices in research software preservation.

 

“By participating in Software Heritage, Brazil can lead the way in Open Science and digital knowledge preservation, ensuring that its research software remains a valuable and accessible resource for future generations.”

 

 

C&C — How Brazilian Research Institutions can get involved

RC — Brazilian universities and research centers can contribute to this movement by:

 

  • Ensuring that research software is systematically archived in Software Heritage.
  • Encouraging researchers to use SWHIDs to reference their software, improving recognition and traceability.
  • Collaborating on Open Science initiatives by integrating Software Heritage into national research infrastructures and funding programs.
  • Joining the Software Heritage Open Science Membership Program, which provides direct engagement opportunities with a global network of institutions working on research software, through the Archives and Libraries Interest Group.

 

“Software Heritage is not just an archive—it is a global infrastructure for Open Science.”

 

 

By participating in Software Heritage, Brazil can lead the way in Open Science and digital knowledge preservation, ensuring that its research software remains a valuable and accessible resource for future generations. Software Heritage is not just an archive—it is a global infrastructure for Open Science. Its integration with major publishers, research repositories, and scholarly initiatives makes it an essential tool for preserving, referencing, and recognizing research software. Brazilian researchers and institutions now have a unique opportunity to leverage Software Heritage to strengthen Open Science and ensure that software remains a trusted and accessible component of scientific knowledge.

 


Figure 2. Roberto di Cosmo
(Photo: https://dicosmo.org. Reproduction)

 


Roberto di Cosmo is a Computer Science professor at the University of Paris Diderot and a researcher at Inria. An expert in open-source software and open science, he founded Software Heritage and led initiatives like the Mancoosi project and IRILL. He chairs IMDEA Software and France’s National Committee for Open Science.
 Fernanda Antonia da Fonseca Sobral is an emeritus professor at UnB and an emeritus researcher at CNPq, collaborating with the Graduate Program in Sociology at UnB. She served as vice president of SBPC for two terms (2019-2021 and 2021-2023) and is currently a Director at SBPC.
Claudia Bauzer Medeiros is full professor, Institute of Computing, University of Campinas (Unicamp), fellow of the Brazilian Academy of Sciences, and member of the FAPESP Coordination Program eScience and Data Science.

Cover. Software Heritage aims to build the largest universal repository of open source code available — an essential global infrastructure for Open Science.
(Photo: Freepik.com. Reproduction)
Roberto di Cosmo

Roberto di Cosmo

Roberto di Cosmo é professor de Informática na Universidade de Paris Diderot e pesquisador no Inria. Especialista em software livre e ciência aberta, fundou o Software Heritage e liderou iniciativas como o projeto Mancoosi e o IRILL. Preside o IMDEA Software e o Comitê Nacional para a Ciência Aberta na França.
Roberto di Cosmo é professor de Informática na Universidade de Paris Diderot e pesquisador no Inria. Especialista em software livre e ciência aberta, fundou o Software Heritage e liderou iniciativas como o projeto Mancoosi e o IRILL. Preside o IMDEA Software e o Comitê Nacional para a Ciência Aberta na França.
Fernanda Antonia da Fonseca Sobral é professora emérita da UnB e pesquisadora emérita do CNPQ, colaborando com o Programa de Pós-Graduação em Sociologia (UnB). Foi Vice Presidente da SBPC em duas gestões (2019-2021 e 2021-2023) e atualmente é Diretora da SBPC.
Claudia Bauzer Medeiros é professora do Instituto de Computação da Universidade Estadual de Campinas (Unicamp), além de membro da Academia Brasileira de Ciências (ABC) e Fellow da World Academy of Sciences (TWAS).
Deixe um comentário

O seu endereço de e-mail não será publicado. Campos obrigatórios são marcados com *

Compartilhe:

Share on facebook
Facebook
Share on twitter
Twitter
Share on linkedin
LinkedIn
Share on whatsapp
WhatsApp
Share on email
Email
Share on facebook
Share on twitter
Share on linkedin
Share on whatsapp
Share on email
Palavras-chaves
CATEGORIAS

Relacionados