0

I have a following sample data(it's just a portion)from my text file. I am trying to extract three keys including timestamp,dataFrame, and rssi into csv file.

packet"{\"test\":{\"id\":1479238177559,\"deveui\":\"0000000033035032\",\"timestamp\":\"2016-11-15T19:29:37.559Z\",\"dataFrame\":\"ABzuPdVNxrSEAV8=\",\"fcnt\":81,\"port\":5,\"rssi\":6,\"snr\":9.5,\"sf_used\":10,\"cr_used\":\"4/5\",\"device_redundancy\":0,\"time_on_air_ms\":288.76800000000003,\"decrypted\":true}}"
Received message in at 2016-11-15 14:29:43.611000
packet"{\"test\":{\"id\":1479238184069,\"deveui\":\"0000000033035032\",\"timestamp\":\"2016-11-15T19:29:44.069Z\",\"dataFrame\":\"ABzuPdVNxrSEAV8=\",\"fcnt\":82,\"port\":5,\"rssi\":6,\"snr\":8.5,\"sf_used\":10,\"cr_used\":\"4/5\",\"device_redundancy\":0,\"time_on_air_ms\":288.76800000000003,\"decrypted\":true}}"
Received message in at 2016-11-15 14:29:49.225000
packet"{\"test\":{\"id\":1479238189685,\"deveui\":\"0000000033035032\",\"timestamp\":\"2016-11-15T19:29:49.685Z\",\"dataFrame\":\"ABzuPdVNxrSEAV8=\",\"fcnt\":83,\"port\":5,\"rssi\":7,\"snr\":9.5,\"sf_used\":10,\"cr_used\":\"4/5\",\"device_redundancy\":0,\"time_on_air_ms\":288.76800000000003,\"decrypted\":true}}"
Received message in at 2016-11-15 14:29:56.410000
packet"{\"testl\":{\"id\":1479238196868,\"deveui\":\"0000000033035032\",\"timestamp\":\"2016-11-15T19:29:56.868Z\",\"dataFrame\":\"ABzuPdVNxrSEAV8=\",\"fcnt\":84,\"port\":5,\"rssi\":3,\"snr\":9.8,\"sf_used\":10,\"cr_used\":\"4/5\",\"device_redundancy\":0,\"time_on_air_ms\":288.76800000000003,\"decrypted\":true}}"
Alex Hall
  • 34,833
  • 5
  • 57
  • 89
James
  • 157
  • 2
  • 3
  • 9
  • 1
    can you update your question with the exact content of your text file? (it seems you are parsing a JSON but the syntax is incorrect) What are you exactly trying to achieve? take same data from a json file and dump it into a csv? – mabe02 Nov 18 '16 at 20:03
  • exact content of my text file is to big. It's literally same data as I showed above. It repeats like packet"{.......}" packet"{...}" – James Nov 18 '16 at 20:08
  • Deleting your code is not a good way to get people to help, they will want to see that you have tried to solve the problem yourself and aren't asking us to do it for you. – Alex Hall Nov 18 '16 at 20:09
  • i am updating in a bit – James Nov 18 '16 at 20:10
  • Does the file always follow the same pattern in terms of being split up into lines? The way the data is formatted is very odd. – Alex Hall Nov 18 '16 at 20:10
  • Each line is 2 JSON objects preceded by the words "packet" and then "Received message ... packet". So you need to split each line in two, then parse the halves as JSON – kdopen Nov 18 '16 at 20:18
  • After edit .. discard lines starting with "Received", strip the leading 'packet"' and the trailing "'" from the other lines, then parse with json.loads() – kdopen Nov 18 '16 at 20:21
  • I would suggest you to find an effective way to parse your file. Some suggestion: read the file, split by \n, use a regex to isolate all the string corresponding to a JSON, map the array of strings parsing it as a json, create a new line using the values of the corresponding keys and dump it to a file – mabe02 Nov 18 '16 at 20:22
  • You must convert JSON string to dict, then create list of dicts. Finally, this might help you: http://stackoverflow.com/questions/3086973/how-do-i-convert-this-list-of-dictionaries-to-a-csv-file-python – Eray Erdin Nov 18 '16 at 20:29

2 Answers2

1

This is apparently data that has accidentally been JSON encoded twice, so it can be decoded twice to get a nice dictionary:

import json

with open('log.txt') as infile:
    packet = []
    for line in infile:
        if line.startswith('packet"{'):
            # Remove 'packet' prefix
            line = line[len('packet'):]
            packet = json.loads(json.loads(line))
            print('Packet:')
            print(packet)
            packet = packet.values()[0]
            print('Values:')
            print(packet['timestamp'], packet['dataFrame'], packet['rssi'])

Output:

Packet:
{u'test': {u'decrypted': True, u'fcnt': 81, u'timestamp': u'2016-11-15T19:29:37.559Z', u'dataFrame': u'ABzuPdVNxrSEAV8=', u'id': 1479238177559, u'sf_used': 10, u'snr': 9.5, u'cr_used': u'4/5', u'deveui': u'0000000033035032', u'device_redundancy': 0, u'rssi': 6, u'port': 5, u'time_on_air_ms': 288.76800000000003}}
Values:
(u'2016-11-15T19:29:37.559Z', u'ABzuPdVNxrSEAV8=', 6)
Packet:
{u'test': {u'decrypted': True, u'fcnt': 82, u'timestamp': u'2016-11-15T19:29:44.069Z', u'dataFrame': u'ABzuPdVNxrSEAV8=', u'id': 1479238184069, u'sf_used': 10, u'snr': 8.5, u'cr_used': u'4/5', u'deveui': u'0000000033035032', u'device_redundancy': 0, u'rssi': 6, u'port': 5, u'time_on_air_ms': 288.76800000000003}}
Values:
(u'2016-11-15T19:29:44.069Z', u'ABzuPdVNxrSEAV8=', 6)
Packet:
{u'test': {u'decrypted': True, u'fcnt': 83, u'timestamp': u'2016-11-15T19:29:49.685Z', u'dataFrame': u'ABzuPdVNxrSEAV8=', u'id': 1479238189685, u'sf_used': 10, u'snr': 9.5, u'cr_used': u'4/5', u'deveui': u'0000000033035032', u'device_redundancy': 0, u'rssi': 7, u'port': 5, u'time_on_air_ms': 288.76800000000003}}
Values:
(u'2016-11-15T19:29:49.685Z', u'ABzuPdVNxrSEAV8=', 7)
Packet:
{u'testl': {u'decrypted': True, u'fcnt': 84, u'timestamp': u'2016-11-15T19:29:56.868Z', u'dataFrame': u'ABzuPdVNxrSEAV8=', u'id': 1479238196868, u'sf_used': 10, u'snr': 9.8, u'cr_used': u'4/5', u'deveui': u'0000000033035032', u'device_redundancy': 0, u'rssi': 3, u'port': 5, u'time_on_air_ms': 288.76800000000003}}
Values:
(u'2016-11-15T19:29:56.868Z', u'ABzuPdVNxrSEAV8=', 3)
Alex Hall
  • 34,833
  • 5
  • 57
  • 89
0

You can try this approach:

import pandas as pd

content = []
for c in open('log.txt').readlines():
    if c.startswith('packet"{'):
        content.append(c[7:-2].decode('string_escape'))

df = pd.concat([pd.read_json(line, orient='index') for line in content])

df[['dataFrame', 'rssi', 'timestamp']].to_csv('out.csv', index=False, header=None)

Data in the out.csv file is:

ABzuPdVNxrSEAV8=,6,2016-11-15 19:29:37.559
ABzuPdVNxrSEAV8=,6,2016-11-15 19:29:44.069
ABzuPdVNxrSEAV8=,7,2016-11-15 19:29:49.685
ABzuPdVNxrSEAV8=,3,2016-11-15 19:29:56.868
amin
  • 1,413
  • 14
  • 24