We're using PyYAML version 5.3.1 under Python 3.7.
We're finding that the order of lists is not being preserved.
For example, assume that in the file example.yaml
, we have the following ...
---
data:
- start
- next
...
And suppose that our Python 3.7 program looks like this:
import yaml
with open('example.yaml', 'r') as f:
input_data = f.read()
datadict = yaml.load(input_data, Loader=yaml.FullLoader)
data = datadict['data']
print(f'{data}')
When we run this program with the same input data on different machines and in different environments (command line, daemon, REST call, etc.), sometimes it prints out this:
['start', 'next']
... and sometimes it prints out this:
['next', 'start']
It's almost as if YAML is initially storing the list elements in a set and then converting that to a list, because element ordering of a set is not guaranteed. Or perhaps YAML sometimes tries to sort the data that goes into a list.
And we get the same behavior with yaml.SafeLoader
instead of yaml.FullLoader
.
Also, if we put a print(input_data)
statement before the yaml.load
statement, we always see the data in the correct order in the output of that print(input_data)
statement, although the list ordering set by YAML still varies as described above.
Has anyone seen this behavior? And if so, what could be causing it, and how can it be corrected so that our list ordering can be maintained?
Thank you in advance.
UPDATE: Responding to the latest comments ...
I tried assert data[0] == 'start'
as suggested, and it indeed fails during those times when the list ordering fails.
I also tried this:
for item in data:
print(item)
... and it also prints the items in the same incorrect order when the f-string printout shows the same thing.
Regarding the question of where this code is running: it's within the following Redhat linux environment:
% uname -a
Linux [HOSTNAME] 3.10.0-1160.81.1.e17.x86_64 #1 SMP Thu Nov 24 12:21:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
In one case, this python code is running from the command line, and it always works properly.
In the other case, it's running from a REST server which is resident on the same machine. In this REST-server case, the order of the list is changed to what seems to be alphabetical order.
In both cases, it's Python 3.7.5, and in both cases, it's PyYAML 5.3.1. And yes, I now agree that some subsidiary package that is imported by the REST server python module probably is indeed altering the behavior of `PyYAML'.
But does anyone know what python package could cause the ordering of list elements to be altered? We're using Flask within the REST server, and at first I wondered whether that could be responsible. However, none of the other lists in our software have reordered elements when running within that same module within that same Flask-based REST server.
The large company at which I'm now working has tight controls over the available software libraries that we can use, including python packages. We have to use PyYAML from our company's software repository. And although it's theoretically just an instance of the standard PyYAML 5.3.1 package, perhaps it has been altered in some way by our "Software Security" team. And again, it indeed could be that some subsidiary package used by PyYAML might have been altered at our installation such that it changes lists to ordered lists under certain conditions, or temporarily uses sets to hold list data before converting back to lists.
Anyway, it seems that I'm simply out of luck with the company's PyYAML package, and so I think that my only solution will be to get the source code for ruamel.yaml or some other YAML implementation and include a copy of that source code into the module I'm working on.
Thanks to all of you for all your help and feedback!
I'll keep this question open for a while longer, in case any new information might surface.
PS: The data that is being read via PyYAML is configuration data for a program. Another solution might be to simply abandon YAML altogether here and switch to JSON or to some configuration-management tool.