-4

I'm currently iterating through a text file and getting back the following output, to make my script effective I would like to delete the duplicate strings containing e.g. 181 and just keep one, see the example below.

Log file to be parsed.

{"id": "242", "status": 61313850, "time": "2015-02-26T08:46:14.070298", "item": 181, }
{"id": "242", "status": 61313850, "time": "2015-02-26T08:46:14.070298", "item": 181, }
{"id": "242", "status": 61313850, "time": "2015-02-26T08:46:14.070298", "item": 181, }
{"id": "242", "status": 61313850, "time": "2015-02-26T08:46:14.070298", "item": 181, }
{"id": "242", "status": 61313850, "time": "2015-02-26T08:46:14.070298", "item": 181, }
{"id": "242", "status": 61313851, "time": "2015-02-26T08:46:14.070298", "item": 180, }

Python code.

#!/usr/bin/env python

with open("tras.json") as infile:
    for line in infile:

    if "time" in line:
        time=line.split()[4:6]

    if "item" in line:
        item=line.split()[6:8]
        print time + item

Current output.

['"time":', '"2015-02-26T08:46:14.070298",', '"item":', '181,']
['"time":', '"2015-02-26T08:46:14.070298",', '"item":', '181,']
['"time":', '"2015-02-26T08:46:14.070298",', '"item":', '181,']
['"time":', '"2015-02-26T08:46:14.070298",', '"item":', '181,']
['"time":', '"2015-02-26T08:46:14.070298",', '"item":', '181,']
['"time":', '"2015-02-26T08:46:14.070298",', '"item":', '180,']

Desired output.

['"time":', '"2015-02-26T08:46:14.070298",', '"item":', '181,']
['"time":', '"2015-02-26T08:46:14.070298",', '"item":', '180,']

Cheers,

Phillip

  • 4
    Your forgot to include the Python code you use to solve this problem. You also forgot to describe what problem you have with this code. –  Feb 27 '15 at 15:51
  • Sorry for that, I've just added the code used to iterate. Phillip – Phillip Bailey Feb 27 '15 at 16:00
  • 1
    1) This Python code will not produce this output. 2) What does not work with your code? –  Feb 27 '15 at 16:00
  • My code produce exactly what I've posted above. – Phillip Bailey Feb 27 '15 at 16:06
  • `print time + item` will not produce output like `['"time":', '"2015-02-26T08:46:14.070298", '"item":', '181,']`. –  Feb 27 '15 at 16:15
  • The if statements probably need to be indented. I'm guessing this is a typo in your question. This might be instructive: http://stackoverflow.com/questions/19483351/converting-json-string-to-dictionary-not-list-python – joel goldstick Feb 27 '15 at 17:08

1 Answers1

1

A complete answer would require more knowledge of your domain, but I hope this example code is helpful:

foundNumbers=set()
clearedData=list()
for dataItem in dataList:
    if dataItem[-1] not in foundNumbers:
        foundNumbers.add(dataItem[-1])
        clearedData.append(dataItem)
Others
  • 2,876
  • 2
  • 30
  • 52