Parsing a YAML file with Python, to extract a text blob between a unique/non-unique keyword pair

Question

I have a program that requires the ability to extract a blob of text from a YAML file, hash it, compare it to the last "run" and then make a decision off the result. The code in question:

 21     def parse_file(self):                                                        
 22     ¦   webservers_template = os.path.abspath('templates/webservers.yaml')       
 23     ¦   with open(webservers_template, 'rb') as stream:                          
 24     ¦   ¦   try:                                                                 
 25     ¦   ¦   ¦   print(stream.read())                                             
 26     ¦   ¦   ¦   metadata_blob = re.findall(r'\n    Metadata:(.*?)\n    Properties:', str(stream)
    )                                                                                               
 27     ¦   ¦   ¦   print(metadata_blob)                                             
 28     ¦   ¦   ¦   return bytes(metadata_blob)                                      
 29     ¦   ¦   except yaml.YAMLError as exc:                                        
 30     ¦   ¦   ¦   print(exc)

The file templates/webservers.yaml is a YAML based cloudformation template. That looks something like this.

The action I'm trying to perform is that I have a unique keyword Metadata and a non-unique keyword Properties, and I would like to return ALL text between these two keywords, the formatting of this doesn't have any requirements other than reliability, it must consistently return in the same way, as it will be the input for a hash function, and naturally I'm using the output of the hash function for a diff operation, so false positives will not be helpful.

The issue I'm currently having with this is that the print(metadata_blob) is not returning anything, it just returns to me an empty list.

To give you a vague idea of the action I'm performing, I am attempting to route around some AWS functionality in order to induce UpdatePolicy actions when I change LaunchConfiguration metadata. This probably isn't important in regards to the question though.

I'm feeling a bit lost with this one, and if my current method of tackling this doesn't make sense, feel free to point me in a more appropriate direction.

Some questions I've read and "borrowed" ideas from whilst writing this.

How can I parse a YAML file in Python

How to extract information between two unique words in a large text file

score 1 · Answer 1 · answered Jun 07 '18 at 08:32

Well considering the test data you've linked, print(metadata_blob) is perfectly right in returning an empty list since there is no Metadata: in the YAML.

More importantly, you do print(stream.read()) which reads the entire file and places the stream's position at the end of the file. Afterwards, every attempt to read from stream does not return anything. My Python is not strong enough to know what exactly happens when you do str(stream) but it definitely is not the usual way to read anything from the file.

Try this:

contents = stream.read()
print(contents)
metadata_blob = re.findall(r'\n    Metadata:(.*?)\n    Properties:', contents)

Also, do away with the try/catch or do something useful with it. You will never get a yaml.YAMLError because you do not use the yaml module for reading the file.

You're a legend. Yeah so the main point of the print statements was to help with debugging, but you're right with the second call to the stream returning nothing because it's a generator, so it can only be called once. Cheers, I haven't got it fully solved but I now have it working at a better level so I should be able to figure it out from here. Thanks! — John Von Neumann, Jun 07 '18 at 23:55

Parsing a YAML file with Python, to extract a text blob between a unique/non-unique keyword pair

1 Answers1