This morning I received an email from Will Norris of the TODO Group, which I wasn’t aware of before. Turns out he’s been trying to launch a discussion about storing project metadata in a portable fashion, and is interested in the .about.yml format that I and some colleagues were developing at 18F.
What follows is a brain dump spurred by Will’s questions in an offline thread.
While I’m no longer at 18F or in the government, I remain willing to help
cultivate the .about.yml
format however I might.
Overview
Here’s the collection of .about.yml
-related repos (and a direct link to the
existing schema):
This was part of my attempt to recreate the kind of internal visibility experience that I had at Google (Googlers: think PDB+Moma, specifically), fit to the 18F environment and largely depending upon repo-based metadata rather than a centralized database (which could, of course, be automatically constructed and queried by a front end). The idea was that this would then feed into static (primarily Jekyll) sites, keeping costs and security concerns at a bare minimum, given the presence of a suitably technical employee at an agency.
I also hacked a little Node.js server to automate feeding data from .about.yml
files and other sources into the 18F Dashboard, though it was a little janky:
Still, as a fragile proof-of-concept, it did mostly work.
The idea of .about.yml
was somewhat popular within 18F; neither I nor the
other folks actively developing it had to poke people too hard to try it. It was
basically an evolution of the monolithic projects.yml
that used to exist as
part of the 18F Dashboard repo, so the YAML-based format was familiar and
noncontroversial. John Jediny and Phil Ashlock of the General Services
Administration’s Data.gov team were also keen on
helping develop it.
There were a lot of open questions and discussion around parts of the schema, even when I left 18F in March. It wasn’t so much contentious as inconclusive, given the greenfield nature of the project and the limited resources available.
History
We didn’t so much create a new format as extract one from existing practice
within the team. As mentioned above, the 18F Dashboard was a Jekyll site that
had a monolithic _data/projects.yml
. I started playing around with that as a
data source for my earliest 18F project (the Hub), and eventually the one file
became many, one file per project. What started as a site-specific data source
was becoming a more general repository of team metadata, and I wanted to use the
Hub to surface more than what appeared on the Dashboard. (The idea being, the
Dashboard would be a higher-level view for an external audience, and the Hub a
detailed view for internal folks and those with a deeper interest.)
When I politely asked project managers over Slack to please update their data one day, I got one particularly obstinate person pushing back and saying it was basically too much work to ask them to keep this data up-to-date. Somehow my frustration over this reaction (how is it not in a PM’s wheelhouse to maintain an accurate repository of project metadata?) led to the insight that if the metadata was in a project’s repository, it shouldn’t be too much of a context switch for the PMs and devs on the project to keep it up-to-date. Then I could hack something to automatically harvest it and join it with other metadata sources—hence the Team API, which exposed the metadata as a fully-connected graph represented as JSON. (I was on Instant Indexing my last two years at Google, so I love joining things; I even created hash-joiner and lambda_map_reduce as separate, reusable components.)
I also wasn’t personally aware of anything resembling project metadata beyond
what’s typically found in .gemspec
files, package.json
files, and the like.
These seemed relatively limited, and things got awkward when a project contained
more than one language.
Trying to model multi-repo projects was another topic we were trying to hash
out, but we hadn’t yet arrived at a firm resolution by the time I left. Still,
.about.yml
seems better suited to an eventually successful model than any of
the other formats of which I’m yet aware.
On top of that, I find YAML easier to edit and structure than practically any other text-based data format (even text-based protobufs). Don’t think I’m particularly unique in that regard. It’s also more accessible to the hardier non-technical folks on the team.
Note on code.gov
I know a couple of the folks that’ve contributed to
code.gov, but even without talking to them in months, I’m
practically certain they chose JSON for the code.json metadata
format because
it’s published as a website endpoint, where you generally want JSON instead of
YAML. I can imagine the code.json
schema somehow blending with .about.yml
,
and then compiling the code.json
file very easily from an .about.yml
file,
just like the team_api
gem compiles the .about.yml
file and other inputs to
JSON.
Conclusion
So out of familiarity, frustration, ignorance, laziness, preference, and
necessity, .about.yml
was born. It’s not that I wanted to reinvent the wheel,
I just wasn’t sure if anyone else already had, and no one else on the team
seemed aware of an existing wheel, either.
As for future development of the .about.yml
format, whether it’s evolved or
absorbed into something more general, I’m happy to participate. I’m certainly
not possessive about it; now just as well as then, I’d like to see a workable
standard solution emerge that helps people discover information about projects,
government or otherwise, regardless whose fingerprints are on it.