-2

i have a text file (>= 60Gig) and record's in it are like this :

{"index": {"_type": "_doc", "_id": "bLcy4m8BAObvGO9GALME"}}
{"message":"{\"_\":\"user\",\"pFlags\":{\"contact\":true},\"flags\":2135,\"id\":816704468,\"access_hash\":\"788468819702098896\",\"first_name\":\"a\",\"last_name\":\"b\",\"phone\":\"123\",\"status\":{\"_\":\"userStatusOffline\",\"was_online\":132}}","phone":"12","@version":"1","typ":"telegram_contacts","access_hash":"123","id":816704468,"@timestamp":"2020-01-26T13:53:29.467Z","path":"/home/user/mirror_01/users_5d6ca02e7e736a7fc700df8c.log","type":"redis","flags":2135,"host":"ubuntu","imported_from":"telegram_contacts"}

{"index": {"_type": "_doc", "_id": "Z7cy4m8BAObvGO9GALME"}}
{"message":"{\"_\":\"user\",\"pFlags\":{\"contact\":true},\"flags\":2143,\"id\":323586643,\"access_hash\":\"8315858910992970114\",\"first_name\":\"bv\",\"last_name\":\"nj\",\"username\":\"kj\",\"phone\":\"123\",\"status\":{\"_\":\"userStatusRecently\"}}","phone":"123","@version":"1","typ":"telegram_contacts","access_hash":"8315858910992970114","id":323586643,"@timestamp":"2020-01-26T13:53:29.469Z","path":"/home/user/mirror_01/users_5d6ca02e7e736a7fc700df8c.log","username":"mbnab","type":"redis","flags":2143,"host":"ubuntu","imported_from":"telegram_contacts"}

I have a few questions regarding this:

  1. Is this a valid JSON file?
  2. Can python process a file of this size? Or should I convert it somehow to Access or Excel file?

These are some SO posts I found useful:

But still need help.

Mdeveloper
  • 11
  • 1

1 Answers1

0

You can work through the file line by line and extract the information you need.

with open('largefile.txt','r') as f:
    for line in f:
        # Extract what you need from that line of text here
        print(line)

For example, to read things You can work through the file line by line and extract the information you need.

with open('largefile.txt','r') as f:
    for line in f:
        # For example, to interpret the string as json, and read 
        # it in as a dictionary, do 
        if line.strip():  # check there is something on the line
            data = json.loads(line)
            # in your case, to fix the value for "message" do
            if 'message' in data: 
                data['message'] = json.loads(data['message']) 
            # extract information you need here

I expect there's a lot more work to extract the information you need, but I hope this gets you started. Good luck!

Michael C
  • 81
  • 4