Use pandas
:
- Given your data, as described, in a
.txt
file.
.to_json
has various parameters to customize the final look of the JSON file.
- Having the data in a dataframe has the advantage of allowing for additional analysis
- The data has a number of issues that can easily be fixed
- No column names
- Improper datatime format
- Whitespace around the URL
import pandas as pd
# read data
df = pd.read_csv('test.txt', sep='|', header=None, converters={2: eval})
# convert column 0 to a datatime format
df[0] = pd.to_datetime(df[0])
# your data has whitespace around the url; remove it
df[1] = df[1].apply(lambda x: x.strip())
# make column 2 a separate dataframe
df2 = pd.DataFrame.from_dict(df[2].to_list())
# merge the two dataframes on the index
df3 = df.merge(df2, left_index=True, right_index=True, how='outer')
# drop old column 2
df3.drop(columns=[2], inplace=True)
# name column 0 and 1
df3.rename(columns={0: 'date_time', 1: 'url'}, inplace=True)
# dataframe view
date_time url type user ip
2019-10-14 13:00:19 www.google.com click root 0.0.0.0
2019-10-14 13:02:19 www.google.com click root 0.0.0.0
2019-10-14 13:05:19 www.google.com click root 0.0.0.0
# same to a JSON
df3.to_json('test3.json', orient='records', date_format='iso')
JSON file
[{
"date_time": "2019-10-14T13:00:19.000Z",
"url": "www.google.com",
"type": "click",
"user": "root",
"ip": "0.0.0.0"
}, {
"date_time": "2019-10-14T13:02:19.000Z",
"url": "www.google.com",
"type": "click",
"user": "root",
"ip": "0.0.0.0"
}, {
"date_time": "2019-10-14T13:05:19.000Z",
"url": "www.google.com",
"type": "click",
"user": "root",
"ip": "0.0.0.0"
}
]