0

I am running the following code, it is supposed to convert a JSON file containing hydrated tweets to CSV format.

import pandas as pd
import json


#Json file name
with open('climate.jsonl') as f:
    lines = f.read().splitlines()
print('jline opened')
df_inter = pd.DataFrame(lines)
df_inter.columns = ['json_element']

print('df_inter.columns')

df_inter['json_element'].apply(json.loads)


print('df_inter')

Twitter_Dataset = pd.json_normalize(df_inter['json_element'].apply(json.loads))

#Output CSV file
Twitter_Dataset.to_csv('Sandy_tweets_unfiltered.csv')
print('Unfiltered tweets have successfuly saved as CSV')


        

I have ran the same code on a few hydrated tweets dataset before and it executed with no problems. But, now I am getting the following error:

Traceback (most recent call last):
  File "2-Json_to_CSV.py", line 15, in <module>
    df_inter['json_element'].apply(json.loads)
  File "/share/apps/miniconda3/installed_1/envs/sgupta5/lib/python3.7/site-packages/pandas/core/series.py", line 4200, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas/_libs/lib.pyx", line 2402, in pandas._libs.lib.map_infer
  File "/share/apps/miniconda3/installed_1/envs/sgupta5/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/share/apps/miniconda3/installed_1/envs/sgupta5/lib/python3.7/json/decoder.py", line 340, in decode
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 13 (char 12)
srun: error: compute-20-9: task 0: Exited with exit code 1

                            
David Scholz
  • 8,421
  • 12
  • 19
  • 34
Kasperr
  • 1
  • 1
  • You didn't show us the JSON input text. Is `json.loads(line)` able to successfully deserialize each input record? https://stackoverflow.com/help/minimal-reproducible-example – J_H Aug 20 '22 at 20:57

1 Answers1

0

You can get lines just withreadlines, no need for splitlines but if you are looking to split the content in each link you can use line_content.split().

Note: You will have to remember to pass json.loads before you can read the file.

#Json file name
with open('climate.jsonl') as f:
    line_content = [json.loads(line) for line in f.readlines()]
Jamiu S.
  • 5,257
  • 5
  • 12
  • 34