Transform text file with part json objects into json file

Question

I'm trying to transform a text file which looks like the following:

14/10/2019 13:00:19 | www.google.com | {"type":"click", "user":"root", "ip":"0.0.0.0"}
14/10/2019 13:02:19 | www.google.com | {"type":"click", "user":"root", "ip":"0.0.0.0"}
14/10/2019 13:05:19 | www.google.com | {"type":"click", "user":"root", "ip":"0.0.0.0"}

With many more rows of the logs. I need to convert it so it is all in a single json object like the following:

{"date_time": "2019-10-14 13:00:19", "url": "www.google.com","type":"click", "user":"root", "ip":"0.0.0.0"}

But I cannot seem to work out an obvious way in Python, any help appreciated

Welcome to StackOverflow! Why don't you add headers to your file with names of your fields, load it to Pandas DataFrame and convert it to json like it's described here - https://stackoverflow.com/questions/50384883/convert-pandas-dataframe-to-json-object-pandas — Stepan Novikov, Oct 29 '19 at 18:00

mpawlak · Answer 1 · 2019-10-30T17:21:59.723

1

You could use datetime and json module. Open the file and iterate over lines, you may need to adapt some parts of the code.

strptime behavior

Working example:

import datetime
import json

in_text = """14/10/2019 13:00:19 | www.google.com | {"type":"click", "user":"root", "ip":"0.0.0.0"}
14/10/2019 13:02:19 | www.google.com | {"type":"click", "user":"root", "ip":"0.0.0.0"}
14/10/2019 13:05:19 | www.google.com | {"type":"click", "user":"root", "ip":"0.0.0.0"}"""

item_list = []
for line in in_text.split("\n"):
    date, url, json_part = line.split("|")
    item = {
        "date_time": datetime.datetime.strptime(date.strip(), "%d/%m/%Y %H:%M:%S"),
        "url": url.strip(),
    }
    item.update(json.loads(json_part))
    item_list.append(item)

print(item_list)

To read lines from a file:

with open("your/file/path.txt") as fh:
    for line in fh:
        # Copy the code from the above example.
        ...

edited Oct 30 '19 at 17:21

answered Oct 29 '19 at 17:31

mpawlak

199
1
4

Thanks! Got this to work whilst using putting the text as a string. Haven't been able to work out how parse the file through it yet though – devnotdev Oct 30 '19 at 09:52
@devnotdev I updated my answer to also cover reading from a file – mpawlak Oct 30 '19 at 17:26
Thank you very much – devnotdev Oct 30 '19 at 18:08

Trenton McKinney · Answer 2 · 2019-10-29T18:17:22.573

Use `pandas`:

Given your data, as described, in a .txt file.
.to_json has various parameters to customize the final look of the JSON file.
Having the data in a dataframe has the advantage of allowing for additional analysis
The data has a number of issues that can easily be fixed
- No column names
- Improper datatime format
- Whitespace around the URL

import pandas as pd

# read data
df = pd.read_csv('test.txt', sep='|', header=None, converters={2: eval})

# convert column 0 to a datatime format
df[0] = pd.to_datetime(df[0])

# your data has whitespace around the url; remove it
df[1] = df[1].apply(lambda x: x.strip())

# make column 2 a separate dataframe
df2 = pd.DataFrame.from_dict(df[2].to_list())

# merge the two dataframes on the index
df3 = df.merge(df2, left_index=True, right_index=True, how='outer')

# drop old column 2
df3.drop(columns=[2], inplace=True)

# name column 0 and 1
df3.rename(columns={0: 'date_time', 1: 'url'}, inplace=True)

# dataframe view
          date_time               url   type  user       ip
2019-10-14 13:00:19   www.google.com   click  root  0.0.0.0
2019-10-14 13:02:19   www.google.com   click  root  0.0.0.0
2019-10-14 13:05:19   www.google.com   click  root  0.0.0.0

# same to a JSON
df3.to_json('test3.json', orient='records', date_format='iso')

JSON file

[{
        "date_time": "2019-10-14T13:00:19.000Z",
        "url": "www.google.com",
        "type": "click",
        "user": "root",
        "ip": "0.0.0.0"
    }, {
        "date_time": "2019-10-14T13:02:19.000Z",
        "url": "www.google.com",
        "type": "click",
        "user": "root",
        "ip": "0.0.0.0"
    }, {
        "date_time": "2019-10-14T13:05:19.000Z",
        "url": "www.google.com",
        "type": "click",
        "user": "root",
        "ip": "0.0.0.0"
    }
]

Rithin Chalumuri · Answer 3 · 2019-10-29T17:42:04.933

0

import json
from ast import literal_eval

def transform_to_json(row):

    d = literal_eval(row[2].strip())
    d["date_time"] = row[0]
    d["url"] = row[1]

    return d


with open('example.txt', 'r') as file:
    json_objs = [transform_to_json(row.split('|')) for row in file.readlines()]

single_json_result = json.dumps(json_objs)

edited Oct 29 '19 at 17:42

answered Oct 29 '19 at 17:33

Rithin Chalumuri

1,739
7
19

Transform text file with part json objects into json file

3 Answers3

Use pandas:

JSON file

Use `pandas`: