YAML Is More Than JSON Without Brackets: Part 2

Jakub Rożek
by Jakub Rożek on May 21, 2021 19 min read

Without a doubt, YAML makes my blood run faster. There is no other data format causing so plenty of mixed feelings in my mind. I came to both love it and hate it. The powerful side of this language pleases me, yet, as people say, with great power comes great responsibility.

Where possible, I’ll try to make a side-by-side comparison, albeit I have to state right away that at times it will be not achievable, as YAML is just by far more powerful than JSON.

I intend to dig into certain areas that are rather lesser known and explore the capabilities YAML has. I firmly believe that although YAML is undoubtedly wonderful, you should familiarize yourself with it before using it.

YAML vs JSON Usability

These differences will mostly cover the usability aspects of each spec.

Comments

YAML has them, but JSON does not. Boom. As simple as that.

It’s worth mentioning that there are JSON abbreviations that support comments as well, such as JSONC (JSON with JS style comments).

Comments in YAML start with #. There are no multiline comments – you are expected to write a few comments instead, each separated by a new line.

Anchors and Aliases

In my opinion, this is the best feature YAML has.

austrian-cities: &austrian-cities // this is an anchor
  - Vienna
  - Graz
  - Linz
  - Salzburg

european-cities:
  austria: *austrian-cities // this is an alias

In short, aliases begin with *, while anchors with &. On top of that, anchors cannot contain [, ], ,, { and }.

An anchor is used to indicate that a given node is supposed to be reusable in the future. In other words, this is a way to tell that you might want to reference that node using an alias node.

Any node can be anchored, but there’s no requirement for them to be referenced by alias nodes, thus it’s not an error if you denote a given node with an anchor, but you don’t actually refer to it later on using an alias.

This feature has several benefits:

  • Ability to reuse certain nodes. This helps us mitigate the repetitions in configuration files and other files that see plenty of patterns. In JSON, you’d need to include everything n times. Not only may this happen to be quite tedious, but the resulting document is also more sparse than it could have been if anchors and aliases were available.
  • Is a rescue boat where circular data structures need to be represented. Thanks to the aliases and anchors, the dump of such data back to text is a piece of cake. JSON is on the opposite side here in this area, as it doesn’t come up with any standardized approach to this problem, and you need to resort to other solutions such as JSON Schema $ref (but such a document is simply less portable.)

As a person who constantly works with JSON Schema $refs (side note – albeit they cannot be circular, in reality, you have to support that scenario), I still see YAML anchors and aliases as a true lifesaver. Tons of users expect to receive a document that’s accepted by their tool, and using a custom approach doesn’t assert that.

The ability to represent circular data can also be a downside when it comes to loading. After all, the data you will work with may have circular references you now need to account for.

If you’re an implementer, the easiest way is to break these circular refs after a document is loaded. This is what we did in a utility called dereferenceAnchor.

Despite not being fully compliant with the spec due to that, you often have no other choice because plenty of tooling just gives up upon circular references. How such an output looks in practice can be checked here.

Flow and Block Scalars / Multiline Strings

In the vocabulary section, we already briefly explained what the scalar in YAML is, but let’s dig more.

To recap, a scalar is a set of Unicode characters, and therefore it’s the most ubiquitous node type you’ll see in YAML.

Although it empowers you to define a variety of data types, in this section our focus will be primarily oriented on strings, as this is where the difference is most notable.

Don’t worry, though, as we’ll certainly explain the topic even further later on.

Flow Scalars

Overall, there are 3 types of flow scalars –

  • plain
  • single-quoted
  • double-quoted
plain: I am plain style!
single-quoted: 'I am surrounded by single quotes!'
double-quoted: "I am surrounded by double quotes!"

Personally, when in doubt, I try to refrain from using the plain style (funnily though, the keys of mapping pairs I used in the document above are plain flow scalars), as it may lead to unpredictable results at times.

I will touch on this topic in a bit, but to show an example and see what we get…

.inf

If we took only available recommended schemas, this value could be represented as both Infinity (float) or plain string.

Unluckily for us, not every library provides information on the schema they use, therefore it’s best to be more explicit and always use quotes, assuming we indeed want a string.

One has to bear in mind that these cases aren’t rather often, therefore leveraging plain style isn’t any kind of anti-pattern.

It’s just occasionally risky if you have a value that might happen to be resolved differently.

Plain Style

Generally speaking, this is the most limited style.

Since it’s not quoted, you cannot quite use any sign that might lead to ambiguity.

This means signs such as - or # must not be used, as their roles are different.

Newlines, however, are still respected. As always, indentation is the key here.

some-key: I am a plain scalar
  span across
  multiple lines

I’ll resort to JSON once again to visualize it.

{
  "some-key": "I am a plain scalar span across multiple lines"
}

The caveat is that all leading and trailing whitespaces get trimmed.

some-key:    I am a plain scalar
  span across
  multiple lines

The JSON output remained unchanged.

{
  "some-key": "I am a plain scalar span across multiple lines"
}

Single-Quoted Style

Single-quoted style is more robust than plain style, in the way it allows more control over whitespaces.

some-key: 'I am a single-quoted scalar
  span across
  multiple lines


 '

Now, the JSON output finally has the newlines!

{
  "some-key": "I am a single-quoted scalar span across multiple lines\n\n"
}

There’s one caveat we need to keep in mind.

Escaping does not work, hence the following document is not valid.

'It\'s a great day'

To fix it, you’d need to add another quote.

'It''s a great day'
"It's a great day"

Because of that, line feeds (\n) won’t yield expected results (assuming you expect a new line, obviously).

'It\n''s a great day'
"It\\n's a great day"

As you can see, \n got escaped.

Double-Quoted Style

Close to single-quoted, but with fully working escaping.

Working escaping is particularly useful for line feeds.

"It\n a great day"

Equal

"It\n a great day"

It has to be noted that spanning single-quoted and double-quoted scalars across multiple lines is sometimes prohibited.

For instance, it’s forbidden to have a multiline mapping key, i.e.

'this
is
invalid':

Block Scalars

There are three factors that have an influence on the final shape of your outcome:

  • Style
  • Chomping
  • Indentation

Each of them affects the document differently, with style being the most impactful, and indentation the least.

The block header is a “combination” of chomping and indentation, and you have to place that header after the style, but right before the content itself.

Here’s what it looks like using simplified BNF (Backus-Naur form) notation.

<style> ::= "|" | ">"
<chomping> ::= ["-" | "+"]
<indentation> ::= ["1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"]

<block-header> ::= <chomping> <indentation> | <indentation> <chomping>
<SYNTAX> ::= <style> <block-header>

Style always has to be placed at the very beginning, while the order of chomping and indentation does not matter.

You can use all of them or style alone.

If you prefer to see the code, you’re in luck, as some time ago I wrote a small function that extracts all the information listed in this section. You can find it here.

Moreover, a decent amount of examples are available as well.

Sadly, certain converters appear to disregard chomping and indentation, thus one needs to be aware of that when using them to convert your YAML document to JSON.

The sections below should explain why that extra verification is crucial.

Style

  • | – literal
  • > – folded

The style determines whether line folding applies to scalar or not. To put it differently, one could say that newlines are treated differently.

When a literal style is used, newlines are preserved, while folded styles make a scalar subject to line folding.

To illustrate, given the following literal style block scalar…

|
Winnie-the-Pooh
Винни-Пух

…you would get such a JSON document:

"Winnie-the-Pooh\nВинни-Пух\n"

Now, if the folded style was used, the output would be considerably different.

>
Winnie-the-Pooh
Винни-Пух
"Winnie-the-Pooh Винни-Пух\n"

As you can see, the newline, except the trailing one, was replaced with spaces. That means newlines are always stripped. You can still have a new empty line.

>
Winnie-the-Pooh

Винни-Пух
"Winnie-the-Pooh\nВинни-Пух\n"

This is certainly one of the most misunderstood features YAML offers. People appear to be often confused about these newline differences.

Chomping

Unfortunately, the “newlines nightmare” does not end on style. There is one additional factor, namely chomping.

This indicator can be used to instrument the processor what should happen to newlines located at the end of the scalar.

  • - – strip
  • + – keep
  • Ø – clip

Luckily, this is a rather negligible influence, as the default chomping is clip (explained a lower down the page), therefore it’s one’s conscious decision to change it.

When strip chomping is used, and all trailing newlines are stripped, as well as the last line break.

|-
Winnie-the-Pooh
Винни-Пух
"Winnie-the-Pooh\nВинни-Пух"

Keep chomping, as the name implies, preserves all trailing newlines together with the line break.

|+
Winnie-the-Pooh
Винни-Пух
"Winnie-the-Pooh\nВинни-Пух\n\n\n"

Now, clip chomping causes all trailing newlines to be removed, but the final line break does get preserved.

|
Winnie-the-Pooh
Винни-Пух
"Winnie-the-Pooh\nВинни-Пух\n"

Indentation

The indentation we adhere to here is the numeric value you can specify on each header.

If no explicit indentation is specified, it is automatically deducted from the source, from the first empty line of the given scalar, to be more detailed.

The valid range is between 1 and 9.

Most of the time, you won’t need to provide any indentation, but there are cases where you may need to.

One of them is when your sentence begins with empty spaces.

Example:

|3
     I want some whitespaces before myself!
  Me not.

This is valid YAML.

If we converted it to JSON, we’d get…

" I want some whitespaces before myself!\nMe not.\n"

Now, if we removed the indentation…

|
     I want some whitespaces before myself!
  Me not.

we’d get an error because the inferred indentation would equal 6, and the “Me not” part of the document has only 2 leading spaces. The fixed version would need to look as below

|
     I want some whitespaces before myself!
     Me not.

Yet likewise, the JSON output would be quite different.

Moreover, it’s also invalid if the given value is too high. The syntax explicitly imposes that indentation is a digit, placed in the range of 1-9.

|10
          this is invalid

Furthermore, it’s incorrect when the indentation does not match an actual indentation used.

|3
this is invalid

Line Folding

Certain styles such as flow and folded block undergo the process called line folding.

In short, it’s an operation where whitespaces get mangled in a particular way.

The intended outcome of this is to improve readability by allowing lengthy lines to be broken into multiple lines. Overall, the process is relatively simple.

Per spec,

If a line break is followed by an empty line, it is trimmed; the first line break is discarded and the rest are retained as content.

'I am
a multiline
string


Oh cool.'
"I am a multiline string\n\nOh cool."

If we placed any whitespaces in between “string” and “oh cool” words, they’d get discarded.

Despite being prone to the same action, yet again the actual final behavior varies depending on the indicators you used (if it’s a folded block) as well as the style of the flow scalar.

Yes, you guessed it. Once more it all boils down to whitespaces.

As we learned moments ago, block scalars have a thing called chomping that determines the way trailing whitespaces are handled.

Another difference is that in the case of block scalars leading and trailing whitespaces placed on each line are preserved.

YAML:

>
  look!
    I have 2 whitespace before!

JSON:

"look!\n I have 2 whitespace before!\n"

YAML:

'
 look!
    I have 2 whitespace before in the document as well, but since I"m a flow scalar I will end up having a single one!
    The new line will also disappear (note the space at the end in the JSON output).
'

JSON:

"look! I have 2 whitespace before in the document as well, but since I\"m a flow scalar I will end up having a single one! The new line will also disappear (note the space at the end in the JSON output). "

YAML:

    look!
    
          I have no whitespaces now, but new line is still is here, because there's an empty line before me!

JSON:

"look!\nI have no whitespaces now, but new line is still is here, because there's an empty line before me!"

Is there an equivalent of block scalars in JSON?

Needless to say, JSON does not offer anything similar. This is a true drawback for certain users, as reading lengthy sentences is somewhat troublesome if you do not have any line-wrapping in your viewer.

JSON is much more opinionated, as it explicitly demands a new line to be explicitly placed within a string.

"Winnie-the-Pooh\nВинни-Пух"

🔗 Data / Node Types

This partially falls under usability, therefore in the spirit of improving the readability, they landed under this chunk of the entire document.

Collections

I couldn’t quite decide whether I should come up with a dedicated section for collections or not. Technically they fall under the tags and schemas section, but having them in a different spot is perhaps a bit better.

Sequences / Arrays

In other programming languages, such a data type is described as an array, list, vector, sequences. In YAML it’s called sequence and in JSON it’s an array.

YAML and JSON arrays have something in common, namely, they’re both ordered, and they can hold any nodes. So, if you key about ordering, this is the right data structure.

This is how things stand from a syntax perspective In JSON, you use brackets [ and ] to denote an array.

["Thailand", "Laos", "Myanmar"]

YAML offers two ways to write a sequence, with the most popular one being a block sequence:

- 'Thailand'
- 'Laos'
- 'Myanmar'

and the other commonly used is flow sequence, which is very similar or same to JSON:

['Thailand', 'Laos', 'Myanmar']

In block sequence, - is used to denote a single entry. In flow sequence, , indicates the end of an entry.

A sequence may point at itself, thus the following document is valid.

&users
- *users

Mappings / Objects

Commonly named as an object, dictionary, hash table, struct, record, keyed list, or associative array.

Same as sequences/arrays, YAML mappings and JSON objects also share some similarities, and yet again the most significant one is the ordering, and to be more precise, the lack of it. They are both unordered collections.

From a syntax point of view, the situation is similar. In JSON, curly braces { and } are meant to express the object.

{
  "Golden Triangle": ["Thailand", "Laos", "Myanmar"]
}

In YAML, like with sequences, you are free to choose from 2 styles.

Block:

'Golden Triangle':
  - 'Thailand'
  - 'Laos'
  - 'Myanmar'

and flow, which is yet again usually very close or equal to JSON:

{ 'Golden Triangle': ['Thailand', 'Laos', 'Myanmar'] }

In some aspects, mappings are notably different from JSON objects.

In YAML, any node type can be used as a mapping key. This implies you can very well use another mapping or sequence, or a numeric scalar, null scalar, etc.

? wow: much complex
: wow: much fun

As you might see I made use of ? to indicate that a given mapping key will be complex.

Usually ? does not need to be used, albeit it generally means a mapping key, so you can use it with simple keys either.

? wow: such mapping

Which, unsurprisingly, in JSON would equal –

{
  "wow": "such mapping"
}

You can also use flow styles, if you would like to, for example:

{ wow: much complex }: { wow: much fun }

If you are a JS developer and use a library like js-yaml you won’t be able to process such data.

JS objects take only strings, and, as of ECMAScript 2015, Symbols as a property key. The trick here would be to consume the AST js-yaml produces yourself, and use Maps instead that are perfect for such data types.

Furthermore, certain tooling may just naively assume that mapping is used as the value.

This may potentially lead to invalid results.

For instance, at Stoplight, we have a function getJsonPathForPosition, and it bails out upon such input. It’s a conscious decision, and we didn’t see a need to support these particular cases.

The aforementioned utility tries to generate a JSON path leading to a value at the current position.

import YAML from '@stoplight/yaml';
import chai from 'chai';

const { expect } = chai;

const document = `hello: world address: street: 123 ? address: street : 123 `;

expect( YAML.getJsonPathForPosition(YAML.parseWithPointers(document), {
    // we follow LSP (Language Server Protocol), hence all values are 0-based
    character: 10,
    line: 2,
  }),
).to.deep.equal(['address', 'street']);

// However, if you use a complex key...
expect( YAML.getJsonPathForPosition(YAML.parseWithPointers(document), {
    character: 2,
    line: 3,
  }),
).to.deep.equal(['address']); // this is not quite valid, we have no way to represent such path, as `type Segment = number | string; type JSONPath = Segment[]``

Fortunately, I haven’t observed complex mapping keys being used often, especially when the consumer can also be given a different format such as JSON.

The real troubles begin when you realize users start inserting values that get resolved to numbers. This is a ubiquitous pattern.

People avoid quoting keys as it’s both more convenient and more readable. The problem here is that such a scalar value is usually resolved to a numeric scalar, and hence the semantics are different.

# this is a portion of some OAS document
responses:
  400:
  # rest

Such a document cannot be expressed in JSON.

# this is a portion of some OAS document
responses:
  400:
  '400':
  # what now?

responses property has two perfectly valid pairs that are different.

Interestingly or not, certain specs such as OAS explicitly tell users that keys need to be a string, so prohibit such usage. The linter @stoplight/spectral I’ve been working on actually yells when non-JSONish keys are present.

Apart from all the above, duplicate keys are strictly prohibited, while JSON has no such restriction in place.

Instead, RFC8259 recommends that each object has a unique key, but does not impose it. This means that certain JSON documents may not be valid YAML documents.

{
  "foo": true,
  "foo": false
}

The above is somewhat valid JSON, but not valid YAML.

Empty node

It’s perhaps worth pointing out that under certain circumstances a lack of value may actually not equal an actual lack of data.

empty:

In the situation above, the value of mapping with the key “empty” will be null.

This is how a JSON document would look like

{
  "empty": null
}

Sequences are prone to the same.

-
[null]

Such a node can also be used as a key of a mapping pair.

: "empty"

The document above cannot be correctly expressed in JSON, because null cannot be used as a property key.

Tags and Schemas

My initial plan was to include this bit as a part of the following article, yet for the sake of keeping a sane length of the current one, I’ll postpone it until the next one.

I am leaving a smaller teaser. The next article will explain why the below…

012 # is sometimes an integer, and sometimes not
! 12 # why this is always a string
# why this YAML doc might be sometimes valid, and sometimes not
~:
null:

We’ll learn how to leverage merge keys to keep things DRY and finally… how to make this code work:

import chai from 'chai';
import yaml from 'js-yaml';

import schema from './schema.mjs';

const { expect } = chai;

const document = ` --- numbers: - !js/eval Math.PI - !js/eval Math.abs(Math.cbrt(125) - Math.sqrt(25) + -5) # you should probably just provide the result rather than evaluating the whole expr, but this is for the sake of showing it works. - !int64 1099511627778 # signed 64bit int - !int64 9223372036854775808 # overflow - !js/bigint 20381928192182918291 `;

expect(yaml.load(document, { schema })).to.deep.equal({
  numbers: [Math.PI, 5, 1099511627778n, -9223372036854775808n, 20381928192182918291n],
});

Closing Thoughts

  • With few exceptions, JSON can be converted to YAML, while YAML to JSON is not necessarily, so these two formats are not always interchangeable. This means you should use a JSON parser for processing JSON input, and a YAML counterpart for YAML,
  • YAML is excellent for configuration files and alike thanks to anchors and aliases that prevent you from having sparse chunks of text, as well as comments which allow you to clearly explain the logic behind a given setup, etc.,
  • Tooling provisioning support for both JSON and YAML will most likely expect JSON-ish usage of YAML (string as keys of mappings’ pairs, using the data types JSON offers supports, etc.),
  • YAML usually offers a few ways (notations) to accomplish the same result, while JSON very rarely does so.
  • YAML is presumably closer to TOML (which, by the way, is pretty popular in the Rust world) than to JSON

Thanks for reading the article! If you enjoyed it please subscribe to 11Sigma LinkedIn to get notified about new content 🙂

See you later, and happy YAMLing!

Share this post

Stoplight to Join SmartBear!

As a part of SmartBear, we are excited to offer a world-class API solution for all developers' needs.

Learn More
The blog CTA goes here! If you don't need a CTA, make sure you turn the "Show CTA Module" option off.

Take a listen to The API Intersection.

Hear from industry experts about how to use APIs and save time, save money, and grow your business.

Listen Now