2

I have a long text file with each line of pseudocode that looks like this:

big house --> ['living room', 'kitchen', 'bathroom']

There are about 700 lines like this that need to be transformed to a python dictionary in the format:

{'big house' : ['living room', 'kitchen', 'bathroom']}

As you can see, for each line, I need to make brackets at the beginning and end, replace the "-->" with ":" and place quotes around the dictionary key. Any help would be greatly appreciated.

Would the best strategy be a find and replace for "-->" and then add brackets to beginning and end in a separate regex, then tackle the key in another regex?

4 Answers4

4

You can get your required output without regex:

Ex:

import ast
s = """big house --> ['living room', 'kitchen', 'bathroom']
big house2 --> ['living room', 'kitchen', 'bathroom']"""
d = {}
for i in s.split("\n"):
    val = i.split("-->")
    d[val[0].strip()] = ast.literal_eval(val[1].strip())
print(d)

Output:

{'big house2': ['living room', 'kitchen', 'bathroom'], 'big house': ['living room', 'kitchen', 'bathroom']}
  • Split your text at "-->" and use index as key & index 1 as value.
  • Use ast.literal_eval to convert the string list to list object.

Using Regex:

import re
import ast
s = """big house --> ['living room', 'kitchen', 'bathroom']
big house2 -->  ['living room', 'kitchen', 'bathroom']"""
d = {}
for i in re.findall("(.*)\s+\-->\s+(.*)", s):
    d[i[0].strip()] = ast.literal_eval(i[1].strip())
print(d)
#{'big house2': ['living room', 'kitchen', 'bathroom'], 'big house': ['living room', 'kitchen', 'bathroom']}
Rakesh
  • 81,458
  • 17
  • 76
  • 113
  • This is very helpful. I had my head stuck in a text editor thinking that the only solution would be to use regex. I will go ahead and run in Python instead. Much appreciated! – Nicholas Beaudoin Apr 16 '18 at 12:42
  • 1
    I thik the dictionary values should be arrays instead of string. – raul.vila Apr 16 '18 at 12:44
3

This is one way of achieving what you need:

import ast

with open('myfile.txt') as f:
    result = {}
    for line in f:
        line = line.split('-->')
        cleanLine = [l.strip() for l in line]
        result[cleanLine[0]] = ast.literal_eval(cleanLine[1])

ast.literal_eval will turn list string into actual list.

zipa
  • 27,316
  • 6
  • 40
  • 58
3

The regex "text editor" solution you asked for that would work in most text editors with a regex find mode would be:

Find:    (.*) --> (.*)
Replace: {'$1': $2}
SmileyChris
  • 10,578
  • 4
  • 40
  • 33
1

You can try dict comprehension :

import re
import ast
print({re.search(r"(\w.+)?-->\s(\['\w.+?\])", line).group(1).strip():ast.literal_eval(re.search(r"(\w.+)?-->\s(\['\w.+?\])",line).group(2)) for line in open('new_filea','r')})

output:

{'big house': ['living room', 'kitchen', 'bathroom']}

P.S: you can read this too if you have doubt what happend if you don't close the file.

Aaditya Ura
  • 12,007
  • 7
  • 50
  • 88