If I have a list of dictionaries that looks something like this:
list = [{'a': 1, 'b': 2, 'c': 3}, {'b': 4, 'c': 5, 'd': 6, 'e': 7}]
How can I convert the list to a Spark dataframe without dropping any keys that may not be shared between the dictionaries? For example, if I use sc.parallelize(list).toDF(), the resulting dataframe would have columns 'a', 'b', and 'c' with column 'a' being null for the second dictionary, and columns 'd' and 'e' from the second dictionary would be droppped completely.
From playing around with the order of the dictionaries, I see that it defers to the keys in the dictionary that appears first in the list, so if I were to swap the dictionaries in my example above, my resulting dataframe would have columns 'b', 'c', 'd', and 'e'.
In reality, there will be far more than two dictionaries in this list, and there will be no guarantee that the keys will be the same from dictionary to dictionary, so it's important that I find a reliable way to handle potentially different keys.