I have a text file that has data in the following shape:
{"id": 1, {"device_data": 123}, "created_at": "2020-04-03"}{"id": 2, {"device_data": 123}, "created_at": '2020-04-03'}{"id": 2, {"device_data": 123}, "created_at": "2020-04-03"}{"id": 2, {"device_data": 123}, "created_at": '2020-04-03'}
There aren't any \n
delimiters or ,
delimiters that I can use here. I'd like to parse this into a list of dictionaries in order to load the data into a dataframe.
I've tried parsing this using .split()
and list comprehension doing something like this:
lst = [x + '}' for x in data.split('}') if x != '']
But this obviously breaks for records that have nested objects.
I also tried doing this with regex but I'm struggling to figure out the appropriate way. This is what I have so far:
re.split('(\{(.*)\})', data)
Per the suggestions below, I also attempted making use of the json
library.
with open('path/to/file', 'r') as f:
res = json.load(f)
However, this resulted in an error with the following message: JSONDecodeError: Extra data
. I believe this is due to the fact that there are multiple valid jsons in this file.
I wanted to use the json.load()
command with a for loop, but then I ran into trouble figuring out how to properly split the file contents.
Does anyone have a suggestion for how to approach this kind of problem?