python regex: match pattern in text file and assign name to each match instance

Question

I have a text file that contains 4 sets of numbers demarcated by square brackets: [-114.63332, -114.63349, -114.63423, …,-114.63305][-103.55583, -104.00265, -104.64165, -105.14679, …, -106.63325, -106.61103][-109.04984, -109.06017, -109.06015, …, -109.0498][-114.04392, -114.04391, -114.04375, -114.04195, …, -114.04558]

I need to extract the sets and assign names to each set: a_lon, b_lon, c_lon, d_lon I have read in the text file and create a regex pattern to match:

with open('x_lons.txt', 'r') as f:
  x_lons = f.read()
print(type(x_lons))

which returns class 'str'

match = re.compile(r'(\[.*?\])')
for m in re.finditer(match, x_lons):
  print(m.groups())

which returns match object that prints:

('[-114.63332, -114.63349, -114.63423, …,-114.63305]',)
('[-103.55583, -104.00265, -104.64165, -105.14679, …, -106.63325, -106.61103]')
('[-109.04984, -109.06017, -109.06015, …, -109.0498]',)
('[-114.04392, -114.04391, -114.04375, -114.04195, …, -114.04558]',)

I have also run a re.split to get similar output without the "()" brackets

At this point I am unable to determine how to assign names to each number set matched by the pattern. I can see the sets in the print() but unable to determine to get the sets assigned to names.

Your indentation is off. Also, what is wrong what th your result? — Jongware, Jan 22 '20 at 18:08
Where does that data come from? I believe this is a duplicate of https://stackoverflow.com/q/1894269/11301900, by the way. — AMC, Jan 22 '20 at 23:17
Does this answer your question? [Convert string representation of list to list](https://stackoverflow.com/questions/1894269/convert-string-representation-of-list-to-list) — AMC, Jan 23 '20 at 00:57

score 2 · Accepted Answer · answered Jan 22 '20 at 18:20

First of all, your catching group was needlessly big. Now, it catches only the list of number which are then parsed as floats, and appended to a list. Also, if you work things like this, you probably don't want to create the variables themselves (like if you had things from 'a_lon' to something like 'zzzzz_lon', you would have a bad day).

import re

with open('tmp.txt', 'r') as f:
    x_lons = f.read()

match = re.compile(r'\[(.*?)\]')

lons = {
    'a_lon': list(),
    'b_lon': list(),
    'c_lon': list(),
    'd_lon': list()
}

current_set_letter = 97  # 97 is the character 'a'

for m in re.finditer(match, x_lons):
    one_set = m.groups()[0]  # as we know that there is only one group here
    for num in one_set.split(r','):
        lons[f'{chr(current_set_letter)}_lon'].append(float(num))
    current_set_letter += 1

print(lons)

Another thing, if your data are strictly always contain 4, square brackets separated list of numbers, you can use another regex that matches for all of the 4 list of numbers, and you could also the regex more specific, and if your data is corrupt, the program would not fail.

This worked, thank you. As items in x_lons are strings, I changed .append(str) and accessed dict by your keys to get individually named listed for processing. As starting and ending list element in each list had either '[' or ']' respectively, I replaced these elements with correct strings then converted elements in list to floats. thank you for your help — GLarose, Jan 22 '20 at 22:04

score 0 · Answer 2 · answered Jan 22 '20 at 18:06

0

a_lon, b_lon, c_lon, d_lon = eval(f.read().replace("]", "],"))

This works?

answered Jan 22 '20 at 18:06

nagataaaas

400
2
13

Using eval for this simple task may look straightforward, but actually it is a bad idea as it introduces a major vulnerability. As if the op uses the code in real production which comes from either from users or from network, this backdoor can be used to mess up the script. However, if you just want to use it by yourself and the data is guaranteed to be clean, the simple yet dirty way may work too. – Janekx Jan 23 '20 at 12:40
Op said "I have a text file that contains 4 sets of numbers demarcated by square brackets" so that isn't problem unless op say "It's user input" – nagataaaas Jan 24 '20 at 18:34

python regex: match pattern in text file and assign name to each match instance

2 Answers2