When I try to use from_records my result looks like this:
CustomFields
0 { "CountryOfManufacture": "China", "Tags": ["U...
1 { "CountryOfManufacture": "China", "Tags": ["U...
2 { "CountryOfManufacture": "China", "Tags": [] }
3 { "CountryOfManufacture": "Japan", "Tags": ["3...
4 { "CountryOfManufacture": "Japan", "Tags": ["1...
I think this is because my data is in an unusual format. My data was provided in a CSV file originally, and this was one of the columns. All other columns were in integer/float/object format, whilst this column was already in dictionary format when you viewed it in Excel.
The data you used for your example below is formatted as I would expect, but this is what mine looks like when converted into a list:
['{ "CountryOfManufacture": "China", "Tags": ["USB Powered"] }', '{ "CountryOfManufacture": "China", "Tags": ["USB Powered"] }', '{ "CountryOfManufacture": "China", "Tags": [] }', '{ "CountryOfManufacture": "Japan", "Tags": ["32GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["16GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["32GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["16GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["32GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["16GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["32GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["16GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["32GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["16GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["32GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["16GB","USB Powered"] }', '{ "CountryOfManufacture": "China", "Tags": ["Comedy"] }', ...
As you can see, I have additional quotes outside of each dictionary list, illustrated with a single line here: ['{ "CountryOfManufacture": "China", "Tags": ["USB Powered"] }'.
Is there a way to get around this without pyspark?
Thanks!