How to write regular expression to find multiple strings in Python

Question

For example, I have a string like

"look[+3],panel button layout[+3],feature[+2]it 's very sleek looking with a very good front panel button layout , and it has a great feature set . "

"look[+3]" means that the sentence talks about certain aspect of an item, and [+3] means it is a positive review with score 3.(This is actually from Amazon review dataset.)

I would like to split it like

X: "it 's very sleek looking with a very good front panel button layout , and it has a great feature set ."

Y: [("look", 3), ("panel button layout", 3), ("feature", 2)]

Um, wouldn't it be possible to scrape the reviews more properly so you won't need a regex to extract information? — Kijewski, Mar 14 '16 at 17:47

alecxe · Accepted Answer · 2016-03-14T18:10:13.500

3

One option would be to capture everything after the beginning of the string or a comma, until [ and extracting the number after the [+:

>>> import re
>>> s = "look[+3],panel button layout[+3],feature[+2]it 's very sleek looking with a very good front panel button layout , and it has a great feature set . "
>>> re.findall(r"(?:^|,)(.*?)\[\+?(\-?\d+)\]", s)
[('look', '3'), ('panel button layout', '3'), ('feature', '2')]
>>>
>>> s = "darn diopter adjustment dial[-1]"
>>> re.findall(r"(?:^|,)(.*?)\[\+?(\-?\d+)\]", s)                                                            
[('darn diopter adjustment dial', '-1')]

where:

(?:^|,) is a non-capturing group that would either match the beginning of a string or a comma
(.*?) is a non-greedy match for any character any number of times
\[\+?(\-?\d+)\] would match an opening [ followed by an optional + followed by a capturing group that would capture one or more digits (with an optional - at the beginning), followed by a closing ]

edited Mar 14 '16 at 18:10

answered Mar 14 '16 at 17:48

alecxe

462,703
120
1,088
1,195

I think `re.finditer()` would work better to answer the other part of the question (`X: "It's …"), but `re.sub()` would probably work just as well. – Kijewski Mar 14 '16 at 17:52
`darn diopter adjustment dial[-1]##the reason i rated it a four is because of that darn diopter adjustment dial. its very small and hard to turn so you can 't get an accurate adjustment ( for those of you who do n't know what a diopter adjustment is , it is to adjust the focus of the viewfinder to your eyesight . ) ` `[]` I find it not working in this case. – Vicky Mar 14 '16 at 17:59
Where did that `##` come from? – Chuck Mar 14 '16 at 18:01
You can ignore that. – Vicky Mar 14 '16 at 18:07
@Vicky please check the update now. It handles both + and - before the digits. – alecxe Mar 14 '16 at 18:13

score 0 · Answer 2 · answered Mar 14 '16 at 17:52

0

You can use re.findall('(.*\[\+\d+\],?)', s) to get your desired Y output.

answered Mar 14 '16 at 17:52

Chuck

866
6
17

score -1 · Answer 3 · edited May 23 '17 at 10:28

-1

try this regular expression:

([^\]]+[[^\]])+(.*)

your key/val pairs are in $1, the summary in $2.

EDIT: while re doesn't support multiple matches per group (only last capture is available), the new regex module does:

>>> m = regex.search(r"([^\]]+[[^\]])+(.*)", "look[+3],panel button layout[+3],feature[+2]it 's very sleek looking with a very good front panel button layout , and it has a great feature set . ")
>>> m.group(1)
',feature[+2]'
>>> m.captures(1)
['look[+3]', ',panel button layout[+3]', ',feature[+2]']
>>> m.group(2)
"it's very sleek looking with a very good front panel button layout , and it has a great feature set . "

edited May 23 '17 at 10:28

Community

1
1

answered Mar 14 '16 at 17:51

Scott Weaver

7,192
2
31
43

Did you look at the link you posted? The expression only finds the last key. – Kijewski Mar 14 '16 at 17:54
I get `(',feature[+2]', "it 's very sleek looking with a very good front panel button layout , and it has a great feature set . ")`. Am I not doing it right? I've tried several `re` functions, but I can't seem to get the right output with that regex. – zondo Mar 14 '16 at 17:54
python supports those match groups, the online tool doesn't - it works. – Scott Weaver Mar 14 '16 at 17:56
Then please post the proper invocation to extract `[("look", 3), ("panel button layout", 3), ("feature", 2)]`. – Kijewski Mar 14 '16 at 18:26

How to write regular expression to find multiple strings in Python

3 Answers3