repeating a section of a regular expression?

Question

I'm having to parse a text dump of a spreadsheet. I have a regular expression that correctly parses each line of the data, but it's rather long. It's basically just matching a certain pattern 12 or 13 times.

The pattern I want to repeat is

\s+(\w*\.*\w*);

This is the regular expression (shortened)

^\s+(\w*\.*\w*);\s+(\w*\.*\w*);\s+(\w*\.*\w*);\s+(\w*\.*\w*);\s+(\w*\.*\w*);\s+(\w*\.*\w*);

Is there a way to match a pattern a set number of times without copy pasting like this? Each of those sections correspond to data columns, all of which I need. I'm using Python by the way. Thanks!

Time to change the accepted answer. – Noumenon Feb 12 '18 at 03:17 — Noumenon, Feb 12 '18 at 03:17

score 67 · Answer 1 · answered Jan 12 '12 at 22:41

67

(\s+(\w*\.*\w*);){12}

The {n} is a "repeat n times"

if you want "12 - 13" times,

(\s+(\w*\.*\w*);){12,13}

if you want "12+" times,

(\s+(\w*\.*\w*);){12,}

answered Jan 12 '12 at 22:41

joe_coolish

7,201
13
64
111

3

... so `(...){12}` repeats the enclosed pattern `...` 12 times; just added this to save you the comparison of the (rather long) pattern with the original question. – Marius Hofert Jun 10 '19 at 18:21
1

how about infinity? – Ardhi Feb 19 '21 at 06:41

score 6 · Accepted Answer · edited Dec 02 '20 at 08:45

6

How about using:

[x.group() for x in re.finditer(r'(\s+(\w*\.*\w*);)*', text)]

Did you find the findall method yet? Or consider splitting at ;?

map(lambda x: x.strip(), s.split(";"))

is probably what you really want.

edited Dec 02 '20 at 08:45

Wiktor Stribiżew

607,720
39
448
563

answered Jan 12 '12 at 22:41

Has QUIT--Anony-Mousse

76,138
12
138
194

Ah, that's a great idea. Splitting at semicolon is much simpler. All I would need to do is remove the whitespace. Thanks! – Joe Lyga Jan 12 '12 at 23:24

repeating a section of a regular expression?

2 Answers2

Linked