0

I have a file of name "input.txt". In this text file a lot of dictionaries are stored. I have to iterate through these dictionaries. How do read the file? Whenever I read the file using open(), file.read() it coverts the whole text into a string type. How to read this file as collection of dictionaries?

Contents of input.txt:

{"label":18,"words":["realclearpolitics","-","election","2016","-","2016","republican","presidential","nomination","polls","year","state"]}
alexis
  • 48,685
  • 16
  • 101
  • 161
arihant34
  • 15
  • 6

5 Answers5

4

the closing list bracket was missing in the string. You can use code like this - with the existing json module of python:

import json

x = '{"label":18,"words":["realclearpolitics","-","election","2016","-","2016","republican","presidential","nomination","polls","year","state"]}'
j = json.loads(x)
print(j)
Community
  • 1
  • 1
DomTomCat
  • 8,189
  • 1
  • 49
  • 64
2

If the contents of a line are a well-formed dict, you can use eval to execute the string in python

line = {"label":18,"words":["realclearpolitics","-","election","2016","-","2016","republican","presidential","nomination","polls","year","state"]}
dictionary = eval(line)
print(dictionary)

So if you have only that one line in input, you can use

dictionary = eval(open("input.txt").read())

or if you have one dictionary per line

with open('input.txt', 'r') as f:
    for line in f:
         dictionary = eval(line)
Isa
  • 751
  • 5
  • 6
1

Use the json module's method to convert loaded str to dict for each line in the file:

import json

with open('input.txt','r') as f:
    for line in f.readlines():
        line_as_dict = json.loads(line)
        # process here the dict
Gábor Fekete
  • 1,343
  • 8
  • 16
  • Suppose my dictionary is {"words" : ['q', 'a', 'asa'], "values":[1,45,3]}. How do I print (word,value)? – arihant34 Jun 17 '16 at 11:41
  • @arihant34, first you read it in as a dictionary, then you read up on how to use dictionaries (which is an entirely different question from what you've asked here.) – alexis Jun 17 '16 at 11:53
  • This solution will only work if each line is a complete json object (or if the file has no newlines at all). Why hobble your code this way? – alexis Jun 17 '16 at 11:54
  • Actually I dealing with big data, the file here is obtained from performing tf-idf on text. The file is in the format of many lines where each line is a dictionary , representing a file. This dictionary consists of words and values . I wanted to map each word to its value. – arihant34 Jun 17 '16 at 12:00
  • In that case, @Gabor's solution is exactly right, and in fact necessary. (A json file can only contain one top-level object.) But to deal with big data, you should be using a saner / more compact format to dump your data. – alexis Jun 17 '16 at 12:06
  • Spark supports json and paraquet. – arihant34 Jun 17 '16 at 12:17
  • If you have a lot of the same data and they can have the same values you could consider using the `__slots__` class member as this will decrease the memory footprint of your big data. However there could be a better solution using another module that is dealing with big data, like the [pandas](http://pandas.pydata.org/) module. Don't reinvent the wheel. – Gábor Fekete Jun 17 '16 at 13:20
1

You can try following

import ast
import json

def readfile():
    f = open(path_to_file, 'r')
    content = f.read()
    data = ast.literal_eval(content)
    print(json.loads(data))

ast.literal_eval raises an exception if the input isn't a valid Python datatype, so the code won't be executed if it's not. So, the content that is being read from file gets validated as well

Output:

{'label': 18,
 'words': ['realclearpolitics',
  '-',
  'election',
  '2016',
  '-',
  '2016',
  'republican',
  'presidential',
  'nomination',
  'polls',
  'year',
  'state']}
Rajesh Yogeshwar
  • 2,111
  • 2
  • 18
  • 37
0

Your JSON is incorrect, you missed to close the array, however the corrected JSON is below:

{
    "label": 18,
    "words": ["realclearpolitics", "-", "election", "2016", "-", "2016", "republican", "presidential", "nomination", "polls", "year", "state"]
}

You can use json built-in function load to read a JSON file:

import json
with open(r'path\of\your\file') as data_file:
    jsonData = json.load(data_file)
    print jsonData # it will print whole JSON data
    print jsonData['words'] # it will print value of the key `word`