Background
I'm looking to convert a text file of ~1.1m lists into JSON, and then into a pandas dataframe. The file is currently set up such that each list is separated by a newline only, and structured in the following manner:
['Here is a string!', 'London, England', [[-2.68, 50.92], [-2.68, 50.96], [-2.61, 50.96], [-2.61, 50.92]], 'FakeUserName', 1234567, [('581294', 'Other_user')]]
Problem
I'd like to convert each list into JSON and subsequently write to a new file, which I can then use in a separate call to pd.read_json
. I am having difficulty owing to the variable length of the mentions element (no limit on the number of mentions tuples).
Ideally the resulting dataframe would have the following columns:
+-----+--------------------+-----------------------+----------------+------------+---------+--------------------------+
| | String | LOC | BB | User | ID | Mentions |
+-----+--------------------+-----------------------+----------------+------------+---------+--------------------------+
| 0 | "Here is a string" | ('London', 'England') | [[-2.68..],..] | 'FakeUser' | 1234567 | [(581294, 'other_user')] |
| 1 | | | | | | |
| ... | | | | | | |
+-----+--------------------+-----------------------+----------------+------------+---------+--------------------------+
Work Done So Far
- Processing each line with
ast.literal_eval(line)
to allow indexing. - Attempted to convert each line using
json.dumps(line)
and then pass to a dataframe. This converts the list into a JSON array resulting in less than ideal interpretation of what each column should be when then passing topd.read_json
- Unsuccessful use of
json_normalize
as described in How to flatten a pandas dataframe with some columns as json?. - Formatting each column manually:
df = pd.DataFrame({"String": list[0], "LOC":list[1]... })
- Creation of custom class (similar to: https://stackoverflow.com/a/44195896/7322036)
Any suggestions for things I've missed? This is proving to be a lot more difficult than i had initially assumed.
EDIT
Added the example list into the table to demonstrate what I'm attempting to do.