55

I think indentation is important in YAML.

I tested the following in irb:

> puts({1=>[1,2,3]}.to_yaml)
--- 
1: 
- 1
- 2
- 3
 => nil 

I expected something like this:

> puts({1=>[1,2,3]}.to_yaml)
--- 
1: 
  - 1
  - 2
  - 3
 => nil 

Why isn't there indentation for the array?

I found this at http://www.yaml.org/YAML_for_ruby.html#collections.

The dash in a sequence counts as indentation, so you can add a sequence inside of a mapping without needing spaces as indentation.

Trevor Boyd Smith
  • 18,164
  • 32
  • 127
  • 177
Sam Kong
  • 5,472
  • 8
  • 51
  • 87
  • apparently it does not need indentation when mapping a scalar to a sequence. – akonsu Jun 09 '13 at 21:54
  • 3
    Both are valid. I agree with you that they should not be. Even The Official YAML Web Site has both... https://yaml.org/ – nroose Mar 20 '19 at 22:38

3 Answers3

42

The short answer is that both are valid because they are unambiguous for the YAML parser. This fact was already pointed by the other answers, but allow me to add some gasoline to this discussion.

YAML uses indentation not only for aesthetics or readability, it has a crucial meaning when composing different data structures and nesting them:

# YAML:         # JSON equivalent:
---             # {
one:            #   "one": {
  two:          #     "two": null,
  three:        #     "three": null
                #   }
                # }
                
---             # {
one:            #   "one": {
  two:          #     "two": {
    three:      #       "three": null
                #     }
                #   }
                # }

As we can see, the simple addition of an indentation level before three changes its nesting level and removes the previous null value assignment we had for two.

This behavior is, however, not consistent when it comes to lists, as they tolerate the removal of a level of indentation that we would naturally expect to occur (as anticipated by the OP), in order to reflect the correct nesting level of the items. It will still work the same way:

# YAML:         # JSON equivalent:
---             #
one:            #
  two:          #
    - foo       # {            
    - bar       #   "one": {   
                #     "two": [ 
                #       "foo", 
                #       "bar"  
                #     ]        
---             #   }          
one:            # }            
  two:          #
  - foo         #
  - bar         #

The second form above is somewhat unexpected and breaks with the idea that the indentation level is connected to nesting level, as it is very clear that both two (an object) and the nested list are written with the same indentation, but are placed at different nesting levels.

What is even worse, it won't work all the times, but only when the list is placed immediately under an object key. Nesting lists inside other lists won't allow freely dropping a level of indentation because, obviously, would bring the nested elements to the parent list:

# YAML:         # JSON equivalent:
---             # {
one:            #   "one": {
  two:          #     "two": [
    -           #       null,
    -           #       [
      -         #         null,
      -         #         null
                #       ]
                #     ]
                #   }
                # }
                #
---             # {           
one:            #   "one": {  
  two:          #     "two": [
    -           #       null, 
    -           #       null, 
    -           #       null, 
    -           #       null  
                #     ]       
                #   }         
                # }         

I know, I know... Don't even start and say that the example above is a bit extreme and could be considered an edge case. They are perfectly valid data structures and prove my point. More complicated situations also happen when mixing objects and nested lists of objects, specially if they have a single key. Not only it may lead to errors in the data structure declaration, but also becomes extremely hard to read.

The following YAML documents are identical:

# YAML:             # JSON equivalent
---                 # 
one:                # {
  two:              #   "one": {
  - three: foo      #     "two": [
  - bar             #       {"three": "foo"},
  - four:           #       "bar",
    - baz           #       {
    five:           #         "four": ["baz"],
    - fizz          #         "five": ["fizz", "buzz"],
    - buzz          #         "six": null
    six:            #       }
  seven:            #     ],
                    #     "seven": null
---                 #   }
one:                # }
  two:              #      
    - three: foo    # 
    - bar           #
    - four:         #
        - baz       #
      five:         #
        - fizz      #
        - buzz      #
      six:          #
  seven:            #   

I don't know about you, but I find the second one much easier to read and follow, specially in a very large document. It's very easy to get lost in the first one, specially when losing the visibility of the beginning of a given object declaration. There is simply no clear connection between the indentation level and the nesting level.

Keeping the indentation level consistently connected to the nesting level is very important to improve readability. Allowing the suppression of an indentation level for lists as optional sometimes is something you have to be very careful about.

Victor Schröder
  • 6,738
  • 2
  • 42
  • 45
  • 4
    yaml indentation rule is very counter intuitive – Alan Jun 15 '22 at 14:21
  • 1
    It would be really helpful if you would add the JSON equivalent for the last example in the same way that you did for all the other examples. But yes I agree YAML sucks. – Neutrino Jul 14 '22 at 08:16
29

Both ways are valid, as far as I can tell:

require 'yaml'

YAML.load(%q{--- 
1:
- 1
- 2
- 3
})
# => {1=>[1, 2, 3]}

YAML.load(%q{--- 
1:
  - 1
  - 2
  - 3
})
# => {1=>[1, 2, 3]}

It's not clear why you think there should be spaces before the hyphens. If you think this is a violation of the spec, please explain how.

Why isn't there indentation for the array?

There's no need for indentation before the hyphens, and it's simpler not to add any.

Darshan Rivka Whittle
  • 32,989
  • 7
  • 91
  • 109
  • 57
    while there is no need for spaces I find it to be more readable – random-forest-cat Apr 05 '15 at 23:49
  • 1
    quite the opposite, when you have objects like kubernetes specs, the more indentations - the less readable it is, due to extra whitespace\scrolling\wrapping – 4c74356b41 Dec 04 '20 at 13:06
  • 9
    I dare to disagree. The suppression of an expected level of indentation for lists makes the document _much_ harder to read, mainly for big files such as k8s specs as you mentioned. Keeping the indentation in sync with nesting level is gold. – Victor Schröder Apr 22 '22 at 12:24
13

It's so you can do:

1: 
- 2: 3
  4: 5
- 6: 7
  8: 9
- 10
=> {1 => [{2 => 3, 4 => 5}, {6 => 7, 8 => 9}, 10]}

Basically, dashes delimit objects, and indentation denotes the "value" of the key-value pair.

That's the best I can do; I haven't managed to find any of the reasons behind this or that aspect of the syntax.

Narfanator
  • 5,595
  • 3
  • 39
  • 71
  • 12
    hm but you can do that anyway ... the equivalent (indenting all lines bar the top by 2 spaces) is the same result – Nick Aug 14 '18 at 10:57
  • Unfortunately, this is not compatible with the Perl 5.18 (the version I am bound to) built-in YAML parser. Without indentation, I get "YAML Error: Invalid element in map". I am not sure if newer versions of Perl have adapted to this apparently legal syntax. – Myles Prather Oct 31 '19 at 17:20
  • Correction: If I 'use YAML::Syck;' in Perl, I am able to read Ruby's default flavor of YAML. The best thing about standards is that there are so many to choose from :). – Myles Prather Oct 31 '19 at 17:47