Integrations: Zenodo

Cranko supports safe, automatic software DOI registration through the Zenodo service operated by CERN in collaboration with other scientific organizations.

Orientation: Software DOIs

While most people think of DOIs as associated with scholarly publications, more and more DOIs are being associated with other forms of digital academic output. And, of course, software is more and more becoming an important form of digital academic output! While it is beyond the scope of this documentation to explain software DOIs in depth, it is worth mentioning the distinction between a version DOI and a concept DOI.

Version DOIs are perhaps more familiar. Just like each release of a software package is assigned a unique version number, each release of a software package can be assigned a unique DOI corresponding to that version. If you want to know which specific version of a piece of software that someone was using, either the exact version number or the exact version DOI will tell you that.

If all you care about is knowing what software someone was running, then version DOIs don't add anything new that version numbers don't already provide. However, unlike version numbers, DOIs are first-class items in the scholarly publishing information ecosystem. When you give software a DOI, it can be integrated into that ecosystem in way that isn't possible otherwise. Probably the most important aspect of this is that software DOIs can be associated with author lists and ORCID iDs using standard scholarly metadata systems, so that researchers can get personal credit when their software is used and cited!

Because we want to be able to know exactly what piece of code a person was running, we absolutely want to create a new DOI for each release of a software package. But if that package has a whole bunch of releases, we have a whole bunch of different DOIs, which is going to make it really tedious to quantify the usage of the package overall. This is where concept DOIs come in. Concept DOIs don’t really carry any information on their own, but they can be used in the DOI metadata framework to link together different releases of the same software package in a machine-understandable way. While the DOI 10.5281/zenodo.6963051 is a machine-usable way to talk about “version 4.21.1 of the transformers” package, the concept DOI 10.5281/zenodo.3385997 is a machine-usable way to refer to the thing that is “the transformers package” overall.

Workflow Overview

Cranko’s support for Zenodo “deposition” involves a multi-stage process. It follows the principles of the just-in-time versioning approach where release metadata only ever appear in tested release artifacts.

  • During the beginning of CI/CD processing, a new Zenodo deposit is preregistered, and the DOIs that will be created are obtained. These can be inserted into the source files of your software, so that it can print out its own DOI. This step can be run during pull-request processing: but instead of doing anything with the Zenodo API, fake DOIs are generated and used.
  • Once CI/CD tests have all passed, you can upload artifacts if so desired, then actually publish the release. Zenodo will actually register the DOIs.
  • Because Zenodo deposits are associated with version numbers, each deposit process is associated with a specific cranko project. In a monorepo scenario, you can run multiple deposits for multiple projects as you see fit.

Getting Started

To start using the Zenodo integration, you need to create a Zenodo metadata file somewhere in your repository. This file is traditionally called zenodo.json5 and can be stored anywhere you feel like.

While you should see the Zenodo Metadata Files page for the full details of the file format, the short version is that it has two main fields. The first, "metadata", contains the metadata that will describe your Zenodo deposition. See the Zenodo developer documentation for a precise definition of all of the fields that can be used here, or check out Cranko’s own version of the file for inspiration. The contents of this file are things you need to decide for yourself, including, most importantly, the author list that you want to associate with your project.

The second field, "conceptrecid", will be used to ensure that successive releases of your project are all tied together with the same concept DOI. When creating the first Zenodo release of your software, you should set this to the special value "new-for:$version", where $version is the planned next version of the project being released. For instance, you might put "new-for:0.12.0" at first. If the preregistration process runs for a different version, it will error out. This precaution helps make sure that you don’t forget to update your metadata file once the concept DOI has been created.

If you're using a monorepo, you can make as many Zenodo releases as you like during CI processing. Just run the relevant commands as many times as needed, and create a different Zenodo configuration file for each project that gets assigned DOIs.

Rewrites

The cranko zenodo preregister command can insert the DOIs that will be registered into your source code. You can use this functionality to create software releases that know their own DOIs.

We suggest that you include commands in your software to print out these DOIs, along the lines of cranko show cranko-version-doi and cranko show cranko-concept-doi. This way, there is an easy way for users to get the precise DOIs relating to the software that they're running. You might also want to insert these DOIs into logs or metadata associated with the files that your software creates, although in many cases the version number is going to be more understandable to users.

This insertion happens during the cranko zenodo preregister command, which will rewrite any files whose paths you pass to it on the command line. The following rewrite rules are followed:

  • The text xx.xxxx/dev-build.$project.version, where $project is the name of the Cranko project being released, is replaced with the version DOI. To be explicit, for Cranko itself the template to be replaced would be xx.xxxx/dev-build.cranko.version.
  • The text xx.xxxx/dev-build.$project.concept, where $project is the name of the Cranko project being released, is replaced with the concept DOI.

If you’re feeling extra-clever, you can include these templates in your CHANGELOG.md entry, and your final changelog will include the DOIs of the release that it describes. (If you do this, you’ll need to pass the path to CHANGELOG.md as an argument to cranko zenodo preregister.)

If you’re building out of source control, these replacements won't happen, of course. If a pull request or other non-release build is being processed, or if you’re in a monorepo and the package in question isn’t being released, fake DOIs with similar forms will be substituted in. You can add checks in your code to see whether the DOIs start with the universal DOI prefix, "10.", to know whether your DOIs are real or fake.

CI/CD Workflow

Zenodo publication operations require you to have a Zenodo API token, which you can create in the Zenodo Account Tokens page. You need to get this token into the environment variable ZENODO_TOKEN for the Zenodo workflow to work.

The cranko zenodo preregister command(s) should be run at the beginning of your CI/CD workflow, before cranko release-workflow commit. As described above, the command inserts placeholders for non-release builds, so you can run it in all of your workflows without worrying about needing to detect whether the current build is for a project release. If you’re using a monorepo with multiple projects that get Zenodo deposits, run the command as many times as needed. After all invocations are done, you should git add your modified files to make sure they get included in the release commit.

At the end of your CI/CD workflow, if you are actually making real releases, you should run cranko zenodo upload-artifacts as needed, then finally cranko zenodo publish to publish your new deposits. Once again, in a monorepo scenario, these commands should be run as many times as needed — with filters in place to only execute them if the projects in question have actually been released. This can be accomplished with cranko show if-released --exit-code.

Continued Releases

After your first successful Zenodo deposit, you should update your zenodo.json5 file and replace the special "conceptrecid" field with the Zenodo record ID corresponding to the “concept” of your software package. This is easily findable in the concept DOI, and is also printed by cranko zenodo preregister.

Going forward, you should review the zenodo.json5 file periodically and update as needed — in particular, you should be attentive to the author list. As with any academic product, the choice of who goes on an author list, and what order that list is in, is not something that can be automated — you have to decide how you want to handle it.