Python remove angle brackets and parse into correct format from stdin?

Question

This is my first post here on stackoverflow, I am working on a programming assignment for school and am working on the following problem which I am stuck on.

Write a program that reads in events from STDIN and outputs the events back to toSTDOUT with the “overlap” flag flipped for the events that overlap with others. The firstline of the input will be the number of events to follow, N. N will be 1 million or more.The subsequent N lines will contain events in the following format:

{ ‘start_time’: string format, ‘end_time’: string format, ‘overlap’: boolean represented as 1 or 0 }

Sample input:

{ ‘start_time’: “2016-01-01 00:00:00”, ‘end_time’: “2016-05-01 00:00:00”, ‘overlap’: 0}
{ ‘start_time’: “2016-02-01 00:00:00”, ‘end_time’: “2016-06-01 00:00:00”, ‘overlap’: 0}
{ ‘start_time’: “2012-01-01 00:00:00”, ‘end_time’: “2012-05-01 00:00:00”, ‘overlap’: 0}

Sample output:

{ ‘start_time’: “2016-01-01 00:00:00”, ‘end_time’: “2016-05-01 00:00:00”, ‘overlap’: 1}
{ ‘start_time’: “2016-02-01 00:00:00”, ‘end_time’: “2016-06-01 00:00:00”, ‘overlap’: 1}
{ ‘start_time’: “2012-01-01 00:00:00”, ‘end_time’: “2012-05-01 00:00:00”, ‘overlap’: 0}

The input would be given in a "timestamp.txt", which I will read using the python io library, I would then input the lines into an array and using either the sorted() method or the .sort() method to sort the lines by timestamp order, Once I have a sorted array, I would then compare each successive events end time and start time to see if there is an overlap with the end_time of the first event with the start_time of the second event.

What I am currently stuck on is how to actual retrieve the value from each line text.

Since the file comes in a format

{ ‘start_time’: “2016-01-01 00:00:00”, ‘end_time’: “2016-05-01 00:00:00”, ‘overlap’: 0}

It is not json format exactly, so I cannot do something like line = json.loads(line) and get the value by line['start_time'].

Anybody have any suggestion for this problem set? Thank you.

score 2 · Answer 1 · answered Apr 20 '18 at 01:26

As each line of your input is already in repr format of dict, consider loading using the ast module. ast is the abstract syntax tree of python and helps to translate written code (string) into python syntax.

    import ast

    #your code to read the file here
    for line in file:
     DoProcessing(ast.literal_eval(line))

A similar question was answered here.

score 0 · Answer 2 · answered Apr 20 '18 at 00:49

have you tried stripping the angle bracket at the beginning and at the end, so it will look like: ‘start_time’: “2016-01-01 00:00:00”, ‘end_time’: “2016-05-01 00:00:00”, ‘overlap’: 0 after the operation, and from there you can split the result string into a list of strings.

score 0 · Answer 3 · answered Apr 20 '18 at 01:27

That looks like json with bizarre quote characters. It would be useful to look at the source of the data to see if valid json is intended and where those quotes are converted to extended unicode quotes. Perhaps someone is running these through a word processor. Or there is some windows code page to unicode translation issue.

Anyway, you can fix the quotes and (at least for your small example) the parsing works

fix_quote_transform = str.maketrans({q:'"' for q in '‘”’“'})
for line in sys.stdin:
    obj = json.loads(line.trans(fix_quote_transform))

Python remove angle brackets and parse into correct format from stdin?

3 Answers3