5

I'm parsing a YAML file in Ruby and some of the input is causing a Psych syntax error:

require 'yaml'

example = "my_key: [string] string"
YAML.load(example)

Resulting in:

Psych::SyntaxError: (<unknown>): did not find expected key
          while parsing a block mapping at line 1 column 1
from [...]/psych.rb:456:in `parse'

I received this YAML from an external API that I do not have control over. I can see that editing the input to force parsing as a string, using my_key: '[string] string', as noted in "Do I need quotes for strings in YAML?", fixes the issue however I don't control how the input is received.

Is there a way to force the input to be parsed as a string for some keys such as my_key? Is there a workaround to successfully parse this YAML?

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Pauline
  • 3,566
  • 8
  • 23
  • 39
  • You may want to paste result correctly. – Marek Lipka Jan 02 '20 at 12:33
  • 1
    Just to understand the problem: What do you expect? The string `[string] string` or the string 'string`? Obviously you don't get valid yaml, so maybe you have a description from the API you use. – knut Jan 02 '20 at 20:24
  • 1
    It's weird that an API would return a result in YAML that isn't actually valid YAML :/ But couldn't you just pre-process the response before reading as YAML? – Scott Schupbach Jan 02 '20 at 21:37
  • 1
    You may not control how the string is received, but you do have control over it immediately prior to parsing it so munging it isn't out of the question. I'd do it in a small piece of code separate from the parsing code, following all the appropriate cautionary steps of backing up the original until you know your code has successfully parsed it. – the Tin Man Jan 02 '20 at 23:45
  • I ran into this scenario with a tool that had a bug choking on parsing `<`, `>` in yaml strings, even when escaped. It's a bit of a hack, but I ended up using the HTML escaped versions instead successfully (`<`, `>`). – Taylor D. Edmiston Mar 29 '23 at 19:34

3 Answers3

4

One approach would be to process the response before reading it as YAML. Assuming it's a string, you could use a regex to replace the problematic pattern with something valid. I.e.

resp_str = "---\nmy_key: [string] string\n"
re = /(\: )(\[[a-z]*?\] [a-z]*?)(\n)/
resp_str.gsub!(re, "#{$1}'#{$2}'#{$3}")
#=> "---\n" + "my_key: '[string] string'\n"

Then you can do

YAML.load(resp_str)
#=> {"my_key"=>"[string] string"}
Scott Schupbach
  • 1,284
  • 9
  • 21
3

It does not work because square brackets have a special meaning in YAML, denoting arrays:

YAML.load "my_key: [string]"
#⇒ {"my_key"=>["string"]}

and [foo] bar is an invalid type. One should escape square brackets explicitly

YAML.load "my_key: \\[string\\] string"
#⇒ {"my_key"=>"\\[string\\] string"}

Also, one might implement the custom Psych parser.

Aleksei Matiushkin
  • 119,336
  • 10
  • 100
  • 160
  • Thanks, I understand that. However I don't control how this input is being sent as it's being sent from an external API. – Pauline Jan 02 '20 at 12:42
  • 2
    Then the only way to go would be to implement a custom `Psych` parser. [Here](https://rocket-science.ru/hacking/2015/04/14/yaml-parser-tuning) is my blof post describing how to accomplish that. – Aleksei Matiushkin Jan 02 '20 at 12:51
  • I'm looking forward to the day I can casually drop my blog post in answer to a question here – Mark Jan 02 '20 at 14:04
  • 1
    Sometimes it's necessary to pre-process an input file to fix known errors prior to passing them to YAML, JSON, or even an XML/HTML parser. It's the nature of the internet that if someone can implement a standard wrong someone will, usually because they had a "bright idea". – the Tin Man Jan 02 '20 at 23:37
-1

There is very native and easy solution. If you would like to have string context you can always put quotes around it:

 YAML.load "my_key: '[string]'"
=> {"my_key"=>"[string]"}
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
max.ivanch
  • 339
  • 2
  • 5