Without a doubt, YAML makes my blood run faster. There is no other data format causing so plenty of mixed feelings in my mind. I came to both love it and hate it. The powerful side of this language pleases me, yet, as people say, with great power comes great responsibility. This is the first installation of a multi-part series!
Working in the Stoplight team as part of 11Sigma crew I contributed to Stoplight Studio & Stoplight Spectral, both of which operate with YAML spec heavily. That allowed me to dive very deep into its specs.
This article will be something between “YAML tutorial” and “what YAML has, but JSON not” since the YAML 1.2 version, YAML is a superset of JSON. Therefore JSON does not quite have any feature YAML wouldn’t have.
Where possible, I’ll try to make a side-by-side comparison, albeit I have to state right away that at times it will be not achievable, as YAML is just by far more powerful than JSON.
I intend to dig into certain areas that are rather lesser-known and explore the capabilities YAML has. I firmly believe that although YAML is undoubtedly wonderful, you should familiarize yourself with it before using it.
Although I’m remarkably torn when it comes to YAML, I’ll do my best to put my opinions towards YAML aside. Therefore there’ll be no ranting here – only facts.
The Stoplight Footprint
Before I kicked off my first work at Stoplight, my attitude towards YAML was entirely neutral, as my expertise about YAML was arguably mediocre.
However, the time I spent handling YAML-specific requests happened to cause disruptions in my mind. While I was discovering more and more of its features, I realized how robust this language is and how by far superior it is if you attempt to compare it with JSON.
In parallel to that process, I observed that the majority of the people I had contact with, whether users or just my peers, had equally average knowledge about YAML and treated YAML as JSON with different formatting.
This sort of reasoning tends to pose a challenge for a variety of reasons, with the most crucial ones being:
- YAML is ubiquitous in our space, it’s presumably more popular than JSON, and it’s commonly used by non-engineers
- We provide tooling that needs decent YAML support
If you’re eager to learn what makes YAML different from JSON, please read on!
Selected Differences
I decided to split the differences into smaller subsections to ensure the integrity of the article. Apart from the “vocabulary” section, each one illustrates features that have a bigger or lesser weight on the final output.
We’ll start from very general aspects and then go down the rabbit hole to unleash more powerful capabilities YAML/JSON has to offer.
Here’s a set of initial assumptions applicable throughout the rest of the reading:
- RFC8259 is the JSON spec we rely on in JSON examples
- YAML 1.2 is the default YAML spec we use in examples unless stated otherwise. The Core Schema is mainly in use
- For the sake of simplification, “newline” is going to be interchangeably used with the line feed character (
x0A
), so – to put it differently – a new line in the following article equals LF.
General
Vocabulary
Perhaps the most trivial distinction out of them all is the naming. Although this may not particularly matter in daily life, I found it essential to pinpoint how things stand from the spec’s point of view because I’m going to rely on them in other parts of the post.
Plus, the variety of tools leverages the terminology introduced by the spec. Thus it’s certainly not a bad idea to grasp such basics.
JSON
- structured
array
object
- primitive
string
number
boolean
null
YAML
sequence
mapping
scalar
– the data type representable as a series of Unicode characters. In practice, a string, a number, a boolean, and similar are scalar values.
The shortlist of node types in YAML might be surprising at first glance, but there’s a lot to surface, so don’t be misled by that. This section is primarily oriented on vocabulary, and any assumptions here don’t rule out anything.
In reality, YAML has more data types than JSON – they will have more exposure soon.
Differences go further beyond the type names, as YAML also somewhat imposes the naming of the data processor should adhere to.
Rather than parse
and stringify
, YAML uses the following terms:
load
dump
For the curious ones, the process of translating a character stream to data structures (the actual data you operate with) is called load
.
Parsing is a step of that process, yet the YAML processor has additional tasks to perform than to simply parse a source.
The reverse operation of serializing data back to text is called dump
.
Hopefully, this clarifies why js-yaml decided to use these names instead of the familiar parse
and stringify
that are natively available in any ES-compliant environment.
Document(s) in a Single Character Stream
This one is likely to be one of the least known YAML possibilities.
YAML allows you to specify multiple documents in a single stream (e.g. file), while JSON does not offer such an option.
---
title: 'I am a document!'
...
---
title: 'I am a different document!'
...
A single stream, but 2 documents. More documents can be inlined if needed, as there’s no upper limit enforced.
Now, let’s consider this character stream.
# I have no documents!
Although it may make little to no sense, it’s still a valid YAML character stream. It has simply no documents in it.
Since we’re introducing some potentially new syntax, it feels more than appropriate to clarify it.
Despite what its usage may suggest, ---
is not a “document start” marker. This particular notation stands for “directives end”.
In short, a directive is an instruction you can supply to the YAML processor. One can pick from two directives:
YAML
TAG
The YAML
directive is explained in another section, while the TAG one will be fully extracted in a separate article.
The second new bit of syntax we used is ...
, and this one is indeed the “document end” marker.
Most of the tools that consume both JSON & YAML text formats will probably assume that a single character stream is equal to a single document.
One has to bear in mind the above fact does not imply that the underlying YAML processors are not prepared to process such a stream.
To back my words with a real-life case, I can say the most popular YAML processor written in JS, namely js-yaml
, is capable of recognizing (via the loadAll
function) multiple documents in a single character stream. However, as stated earlier, most of the consumers of the aforementioned are unlikely to.
JSON, on the other hand, is limited to a single document, as it doesn’t expose any way to define a document explicitly.
Character Encoding
In JSON, according to RFC8259, the character encoding is set to UTF-8. Besides, the byte order mark cannot be included at the beginning of the stream. JSON processors may ignore it, but character stream cannot contain it.
YAML supports UTF-8, UTF-16, as well as UTF-32. In addition, the YAML processor must support character streams starting with the byte order mark.
Interestingly, both UTF-16 and UTF-32 encodings were added for the sake of JSON compatibility, which, as we described, accepts only UTF-8.
It has to be stated that prior to RFC8259 (written in 2017.12), JSON had no such requirement. The last revision of YAML 1.2 spec took place in 2009.10, so back then, it could matter.
Spec Versioning
Another, presumably also appearing to be a trivial difference, is that YAML lets you specify a version of the YAML spec you want to use in your document by using a directive called YAML
.
Intriguingly, if a YAML processor wants to be spec-compliant it’s supposed to support older versions.
%YAML 1.1
---
This is a valid document that should be processed according to YAML 1.1.
On the other hand, according to the specification, processors should raise warnings for higher minor versions, i.e. 1.3, 1.4, etc., and should bail upon major versions, i.e. 2.x, 3.x, etc.
%YAML 1.3
---
This one should raise a warning
...
%YAML 2.0
---
while this one should result in an error
JSON does not offer you any of the above, but at the same time, one has to notice JSON spec hasn’t changed significantly, therefore there was not such a need.
Originally posted on the 11Sigma Blog