Regular expression strategy for newbies

Question

I have a long text file with each line of pseudocode that looks like this:

big house --> ['living room', 'kitchen', 'bathroom']

There are about 700 lines like this that need to be transformed to a python dictionary in the format:

{'big house' : ['living room', 'kitchen', 'bathroom']}

As you can see, for each line, I need to make brackets at the beginning and end, replace the "-->" with ":" and place quotes around the dictionary key. Any help would be greatly appreciated.

Would the best strategy be a find and replace for "-->" and then add brackets to beginning and end in a separate regex, then tackle the key in another regex?

I am new to regex and was looking for a strategy, one expression or to break it up. — Nicholas Beaudoin, Apr 16 '18 at 12:40

Rakesh · Answer 1 · 2018-04-16T12:59:31.130

You can get your required output without regex:

Ex:

import ast
s = """big house --> ['living room', 'kitchen', 'bathroom']
big house2 --> ['living room', 'kitchen', 'bathroom']"""
d = {}
for i in s.split("\n"):
    val = i.split("-->")
    d[val[0].strip()] = ast.literal_eval(val[1].strip())
print(d)

Output:

{'big house2': ['living room', 'kitchen', 'bathroom'], 'big house': ['living room', 'kitchen', 'bathroom']}

Split your text at "-->" and use index as key & index 1 as value.
Use ast.literal_eval to convert the string list to list object.

Using Regex:

import re
import ast
s = """big house --> ['living room', 'kitchen', 'bathroom']
big house2 -->  ['living room', 'kitchen', 'bathroom']"""
d = {}
for i in re.findall("(.*)\s+\-->\s+(.*)", s):
    d[i[0].strip()] = ast.literal_eval(i[1].strip())
print(d)
#{'big house2': ['living room', 'kitchen', 'bathroom'], 'big house': ['living room', 'kitchen', 'bathroom']}

This is very helpful. I had my head stuck in a text editor thinking that the only solution would be to use regex. I will go ahead and run in Python instead. Much appreciated! — Nicholas Beaudoin, Apr 16 '18 at 12:42
I thik the dictionary values should be arrays instead of string. — raul.vila, Apr 16 '18 at 12:44

score 3 · Answer 2 · answered Apr 16 '18 at 12:45

This is one way of achieving what you need:

import ast

with open('myfile.txt') as f:
    result = {}
    for line in f:
        line = line.split('-->')
        cleanLine = [l.strip() for l in line]
        result[cleanLine[0]] = ast.literal_eval(cleanLine[1])

ast.literal_eval will turn list string into actual list.

score 3 · Answer 3 · answered Apr 16 '18 at 12:46

3

The regex "text editor" solution you asked for that would work in most text editors with a regex find mode would be:

Find:    (.*) --> (.*)
Replace: {'$1': $2}

answered Apr 16 '18 at 12:46

SmileyChris

10,578
4
40
33

This is exactly what I was looking for. Fixed the problem. I was way overthinking it. Thanks Chris! – Nicholas Beaudoin Apr 17 '18 at 23:36

score 1 · Answer 4 · answered Apr 16 '18 at 13:54

You can try dict comprehension :

import re
import ast
print({re.search(r"(\w.+)?-->\s(\['\w.+?\])", line).group(1).strip():ast.literal_eval(re.search(r"(\w.+)?-->\s(\['\w.+?\])",line).group(2)) for line in open('new_filea','r')})

output:

{'big house': ['living room', 'kitchen', 'bathroom']}

P.S: you can read this too if you have doubt what happend if you don't close the file.

Regular expression strategy for newbies

4 Answers4