0

I'm having a bit of trouble searching a file using python regex.

I would like to input a list of regexs and return the lines of the file that match one of them in a jagged list that is indexed in the same way was the rexex list, i.e. if a line matches the 1st regex it will be added with: results[0].append(line) and if the second is matched: results[1].append(line) and so on...

import re

def search(path, regex_list):
     reg_list = [re.compile(regex) for regex in regex_list]
     results = reg_list.__len__()*[[]]
     with open(path, 'r') as fp:
         for line in fp:
             for i, reg in enumerate(reg_list):
                 if reg.search(line):
                     results[i].append[line]
    return results

print(search("./log", ['1234', '1233']))

I woud like my output to be:

[['log entry 1234\n'], ['log entry 1233\n']]

but what I really get is:

[['log entry 1234\n', 'log entry 1233\n'], ['log entry 1234\n', 'log entry 1233\n']]

I'm pretty new to python so I could be doing something really stupid, any ideas what it is?

jayjay
  • 1,017
  • 1
  • 11
  • 23
  • 2
    [`Use: results = [[] for _ in xrange(len(reg_list))]`](http://stackoverflow.com/questions/13058458/python-list-index) – Ashwini Chaudhary Nov 04 '13 at 10:33
  • I propose to name the strings describing a regexp _pattern_, the compiled versions _regexp_ and after a applying `match` I call them _matches_ to avoid confusions like your _regex_ and _reg_. But that's not the topic here. – Alfe Nov 04 '13 at 10:49

1 Answers1

2

By multiplying the list of an empty list (results = reg_list.__len__()*[[]]) in your code, you simply create several pointers all pointing to the same empty list.

If some code later appends something to that list, all pointers point to that extended list.

Instead create a list of (non-identical) empty lists at init:

[[] for reg in reg_list]
Alfe
  • 56,346
  • 20
  • 107
  • 159