python: how to use a dictionary with list values to search a file

Question

Hope this is a straightforward issue:

Here is my dictionary temp = {'0.1995': ['in1', 'in2'], '0.399': ['in0', 'y']})

code to search file:

for line in SPFFile:
    temp_dict = temp            
    for val in temp_dict.itervalues():      
    if re.search(val.upper(),line) and ((re.search("^R",line) or re.search("^C",line))):
         print "value found!"

My problem is that val is a list such as ['in1','in2'] while I need val to be 'in1' then 'in2' and so forth.

Also if I shouldn't be using a dictionary to do this please let me know. The dictionary was formed from two lists.

score 2 · Accepted Answer · answered Jan 15 '13 at 04:55

2

Change your inner for loop to:

for key, vals in temp_dict.items():
    if re.search('|'.join(vals)):
        #do stuff here

As for using a dictionary in the first place, it all depends on whether you need your values organized via keys as you have above. If you're just trying to check if any of the values is present in a given line, it might be better to '|'.join() all the values together and use the resulting string as your search expression.

answered Jan 15 '13 at 04:55

Joel Cornett

24,192
9
66
88

Thanks, I did need to use some of @Nicholas re advice to get the rest working. My if statement looks like this now: `if re.search('(?i)' '|'.join(pin_values),line) and re.match('[RC]',line):` I couldn't figure out how to make it one re.match statement. – Jon A. Jan 15 '13 at 16:57
Are you aware that `re.match` searches from the beginning of the string, while `re.search` can start its search anywhere in the string? If you have a string `"sR blah blah"`, and you do `re.match(r'[RC]', "sR blah blah")`, you will *not* get a match as `re` first sees that the character `s` isn't in the set `{R, C}`, and returns `None`. – Joel Cornett Jan 15 '13 at 20:13
Yes, I am aware about re.match and I use it for [RC] because they only occur in the beginning of the string. – Jon A. Jan 15 '13 at 21:36

score 2 · Answer 2 · edited May 23 '17 at 12:12

While you can certainly trade off readability for performance, try using one regular expression rather than three. So, for example:
```
if re.match('[RC].*(?:%s)' % '|'.join(map(re.escape, val), line):
    print "value found!"
```
would do what you describe above in one step, assuming that 'R' or 'C' is not part of the item of val you're trying to match. If it is, you can use lookahead instead:
```
if re.match('(?=.*%s)[RC]' % '|'.join(map(re.escape, val), line):
    print "value found!"
```
temp_dict = temp doesn't do anything unless you plan on reassigning to temp somewhere later; it just gives the contents of temp a new name. You might also consider giving your variables more meaningful names than temp and val.
While there is a regular expression cache built into the re module, you should get in the habit of compiling the regular expressions you're going to use repeatedly, as it will give you substantial performance benefits. This is my style, which may be way too verbose for you:
```
RE_BEGINS_WITH_R_OR_C = re.compile('^[RC]')
```
Of course if you're using a new regular expression every time through the loop, there's no point in doing that... but if as in your code above, if you only care that an item matches, not which item matches, then you could flatten the list of lists using this answer (nested list comprehension syntax is confusing, I'm not going to argue :-), compile a single regular expression and just use that for every line of the file.
Be aware of the difference between re.match and re.search. There's no need to anchor re.search when you can use re.match.

In general, read the documentation! It's not bad! You might start by looking at the bits of Python you're using (strings, lists, dictionaries and regular expressions).

python: how to use a dictionary with list values to search a file

2 Answers2