0

I have a very large txt file to read and process it, but as I'm new at Python, I don't know what is the format of the file and how can I can read it. Below there is a sample:

[
    {"content": "111111", "n": "ITEM 1", "a": "ANOTHER", "t": 1},
    {"content": "222222", "n": "ITEM 2", "a": "ANOTHER", "t": 1},
    {"content": "333333", "n": "ITEM 3", "a": "ANOTHER", "t": 1}
]

So, I need to take a loop each item inside the list '[]' (what I think I did), and then, each item like "content", "n", "a", "t".

I tried to read the file and take a loop like this:

for item in thecontent:
    data = json.load(item)

pprint(data)

I think I got each 'item' on the loop above as a string, not as json.

Edit 2 I think that I need to use the ujson data type, as the sample I got at the documentation is the same here, above. If you want to know better, go to the documentation page

>>> import ujson
>>> ujson.dumps([{"key": "value"}, 81, True])
'[{"key":"value"},81,true]'
>>> ujson.loads("""[{"key": "value"}, 81, true]""")
[{u'key': u'value'}, 81, True]

Thanks everyone!

Edit 3: I kept looking for any answer about the problem I had, and just found that the problem wasn't about 'how to read' a list or tuples, because I did this by the file.

The main problem was about how to convert bytes to string when get the content from web, and I solve it in this topic, more specifically at this reply.

The code I wrote to get the webcontent and convert it to json is that:

def get_json_by_url(url):
    r = requests.get(url)
    r.raise_for_status()
    return json.loads(r.content.decode('utf-8'))

So, as maybe this is a solution for anyone who is looking for this, I've changed the title from 'How to read a list of tuples (or json) in python' to 'How to get content from web and convert from bytes to str/json' wich was the problem I got.

I'm sorry about not to explain very well the problem, so as I'm new at Python, sometimes it takes a lot of time to diagnose what is the problem itself.

Thanks all!

Community
  • 1
  • 1
  • In Python, you can get literally any functionality from the right import. For example, you can even `import antigravity`. In your case, it would be more appropriate to `import json`. – Iluvatar Dec 09 '16 at 01:31
  • Hello Julien, very very thanks by asking more about the question. I tried to write better what I did until now and what I meant as 'process', that is get the data into a loop. I'll appreciate your help if it's possible! – m4ss4cr4t10n Dec 09 '16 at 01:54
  • What do you mean that you don't know the format of the file? Is it in the format in your example above or not? – Son of a Beach Dec 09 '16 at 04:39
  • I just made a new change at the post. Very thanks everyone who takes a little time to try to help me, and sorry because the explanation of the problem wasn't so good. – m4ss4cr4t10n Dec 10 '16 at 23:52

1 Answers1

0

These two solutions both worked for me, and assume that the file is in the format in your example above. It depends on what you want to do with this data after you've loaded it from the file, though (you didn't specify this).

Firstly, the simple/fast version which ends up with all data in one list (a list of dictionaries):

import json

with open("myFile.txt", "r") as f:
    data = json.load(f)  #  load the entire file content into one list of many dictionaries

#  process data here as desired possibly in a loop if you like
print data

Output:

[{u'content': u'111111', u'a': u'ANOTHER', u't': 1, u'n': u'ITEM 1'}, {u'content': u'222222', u'a': u'ANOTHER', u't': 1, u'n': u'ITEM 2'}, {u'content': u'333333', u'a': u'ANOTHER', u't': 1, u'n': u'ITEM 3'}]

For a very large file, or if you don't want all the data in a single list:

import json

with open("myFile.txt", "r") as f:
    for line in f:                       #  for each line in the file
        line = line.strip(", ][\n")      #  strip off any leading and trailing commas, spaces, square brackets and newlines
        if len(line):                    #  if there is anything left in the line it should look like "{ key: value... }"
            try:
                data = json.loads(line)  #  load the line into a single dictionary
                #  process a single item (dictionary) of data here in whatever way you like
                print data
            except:
                print "invalid json:  " + line

Output:

{u'content': u'111111', u'a': u'ANOTHER', u't': 1, u'n': u'ITEM 1'}
{u'content': u'222222', u'a': u'ANOTHER', u't': 1, u'n': u'ITEM 2'}
{u'content': u'333333', u'a': u'ANOTHER', u't': 1, u'n': u'ITEM 3'}

The first option should be fine for most cases, even for reasonably large files.

Son of a Beach
  • 1,733
  • 1
  • 11
  • 29