0

I have huge text file which I have to parse.

individual line of the file contains some text and dict. I only care about dict data.

file contain logs in the following format

my data : {"a":1, "b":2, "c": 3}
my data : {"a":23, "b": 44, "c": 565}
my_data : {"a":1233, "b": 21, "c":544}

so, from above data I am only looking for dict.

I tried with

f = open(‘text.file’,'r’)
my_dict = eval(f.read())

but it gives me error as the initial part of the line is string. So, my question is what is the best way to extract dict from the file.

kkard
  • 79
  • 3
  • 10
  • Your `f = open(‘text.file’,'r’)` line is using non-ASCII "smart quotes". See how the left quote mark curves differently from the right quote mark? Python doesn't support these. Just use regular apostrophes. – Kevin Aug 21 '15 at 18:34
  • if `my_data` is written as a string (that is, it's written as `"my_data"`), then it seems your file is actually in a JSON format, which can make your life a whole lot easier: simply use the json module to decode the file, it will parse the dictionaries for you. – SivanBH Aug 21 '15 at 18:41
  • Your os is windows ? – dsgdfg Aug 21 '15 at 18:42

3 Answers3

1

It looks like you've got some delimator between the strings, so str.split() is your friend there.

Afterwards, consider using the AST module instead of the eval. It presents less of a security risk than blindly eval'ing.

>>>import ast
>>> a = ast.literal_eval("{'a':1}")
>>> type(a)
<class 'dict'>
>>> a
{'a': 1}
ardent
  • 2,453
  • 1
  • 16
  • 15
1

eval is bad

here's what I would do:

import json

dicts = []
with open('text.file', 'r') as f:
    for line in f.readlines():
        if not line: continue
        _, dict_str = line.split(':', 1)
        dict_str = dict_str.strip()
        dict = json.load(dict_str)
        dicts.append(dict)
Community
  • 1
  • 1
scytale
  • 12,346
  • 3
  • 32
  • 46
  • Thank you for the help but line = f.realines will give me list..so how to use split with it – kkard Aug 21 '15 at 21:29
1

You can use the re module

import re
text = """my data : {"a":1, "b":2, "c": 3}
          my data : {"a":23, "b": 44, "c": 565}
          my_data : {"a":1233, "b": 21, "c":544}"""
dict = re.compile(r"{[^}]*?}", re.I)
matches = dict.finditer(text)
for match in matches:
    my_dict = eval(match.group())
    print(my_dict) 

which gives you

{'b': 2, 'c': 3, 'a': 1}
{'b': 44, 'c': 565, 'a': 23}
{'b': 21, 'c': 544, 'a': 1233}
Joseph Stover
  • 397
  • 4
  • 13