.about.yml background

Some background on the .about.yml project metadata format prompted by an unexpected inquiry from the TODO Group.

04 Nov 2016 - Alexandria
Tags: 18F, programming, technical

This morning I received an email from Will Norris of the TODO Group, which I wasn’t aware of before. Turns out he’s been trying to launch a discussion about storing project metadata in a portable fashion, and is interested in the .about.yml format that I and some colleagues were developing at 18F.

What follows is a brain dump spurred by Will’s questions in an offline thread. While I’m no longer at 18F or in the government, I remain willing to help cultivate the .about.yml format however I might.

Overview
History
Note on code.gov
Conclusion

Overview

Here’s the collection of .about.yml-related repos (and a direct link to the existing schema):

This was part of my attempt to recreate the kind of internal visibility experience that I had at Google (Googlers: think PDB+Moma, specifically), fit to the 18F environment and largely depending upon repo-based metadata rather than a centralized database (which could, of course, be automatically constructed and queried by a front end). The idea was that this would then feed into static (primarily Jekyll) sites, keeping costs and security concerns at a bare minimum, given the presence of a suitably technical employee at an agency.

I also hacked a little Node.js server to automate feeding data from .about.yml files and other sources into the 18F Dashboard, though it was a little janky:

Still, as a fragile proof-of-concept, it did mostly work.

The idea of .about.yml was somewhat popular within 18F; neither I nor the other folks actively developing it had to poke people too hard to try it. It was basically an evolution of the monolithic projects.yml that used to exist as part of the 18F Dashboard repo, so the YAML-based format was familiar and noncontroversial. John Jediny and Phil Ashlock of the General Services Administration’s Data.gov team were also keen on helping develop it.

There were a lot of open questions and discussion around parts of the schema, even when I left 18F in March. It wasn’t so much contentious as inconclusive, given the greenfield nature of the project and the limited resources available.

History

We didn’t so much create a new format as extract one from existing practice within the team. As mentioned above, the 18F Dashboard was a Jekyll site that had a monolithic _data/projects.yml. I started playing around with that as a data source for my earliest 18F project (the Hub), and eventually the one file became many, one file per project. What started as a site-specific data source was becoming a more general repository of team metadata, and I wanted to use the Hub to surface more than what appeared on the Dashboard. (The idea being, the Dashboard would be a higher-level view for an external audience, and the Hub a detailed view for internal folks and those with a deeper interest.)

When I politely asked project managers over Slack to please update their data one day, I got one particularly obstinate person pushing back and saying it was basically too much work to ask them to keep this data up-to-date. Somehow my frustration over this reaction (how is it not in a PM’s wheelhouse to maintain an accurate repository of project metadata?) led to the insight that if the metadata was in a project’s repository, it shouldn’t be too much of a context switch for the PMs and devs on the project to keep it up-to-date. Then I could hack something to automatically harvest it and join it with other metadata sources—hence the Team API, which exposed the metadata as a fully-connected graph represented as JSON. (I was on Instant Indexing my last two years at Google, so I love joining things; I even created hash-joiner and lambda_map_reduce as separate, reusable components.)

I also wasn’t personally aware of anything resembling project metadata beyond what’s typically found in .gemspec files, package.json files, and the like. These seemed relatively limited, and things got awkward when a project contained more than one language.

Trying to model multi-repo projects was another topic we were trying to hash out, but we hadn’t yet arrived at a firm resolution by the time I left. Still, .about.yml seems better suited to an eventually successful model than any of the other formats of which I’m yet aware.

On top of that, I find YAML easier to edit and structure than practically any other text-based data format (even text-based protobufs). Don’t think I’m particularly unique in that regard. It’s also more accessible to the hardier non-technical folks on the team.

Note on code.gov

I know a couple of the folks that’ve contributed to code.gov, but even without talking to them in months, I’m practically certain they chose JSON for the code.json metadata format because it’s published as a website endpoint, where you generally want JSON instead of YAML. I can imagine the code.json schema somehow blending with .about.yml, and then compiling the code.json file very easily from an .about.yml file, just like the team_api gem compiles the .about.yml file and other inputs to JSON.

Conclusion

So out of familiarity, frustration, ignorance, laziness, preference, and necessity, .about.yml was born. It’s not that I wanted to reinvent the wheel, I just wasn’t sure if anyone else already had, and no one else on the team seemed aware of an existing wheel, either.

As for future development of the .about.yml format, whether it’s evolved or absorbed into something more general, I’m happy to participate. I’m certainly not possessive about it; now just as well as then, I’d like to see a workable standard solution emerge that helps people discover information about projects, government or otherwise, regardless whose fingerprints are on it.