The following code displays data from a JSON Line file.
import pandas as pd
import numpy
start = time.time()
with open('stela_zerrl_t01_201222_084053_test_edited.json', 'r') as fin:
df = pd.read_json(fin, lines=True)
parsed_data = df[["SRC/Word1"]].drop_duplicates().replace('', np.NAN).dropna().values.tolist()
print(parsed_data)
The output is:
[[' '], ['E1F25701'], ['E15511D7']]
Is there a way remove the blank data, duplicates, and store it as an array?