-1

For example, I have a string like

"look[+3],panel button layout[+3],feature[+2]it 's very sleek looking with a very good front panel button layout , and it has a great feature set . "

"look[+3]" means that the sentence talks about certain aspect of an item, and [+3] means it is a positive review with score 3.(This is actually from Amazon review dataset.)

I would like to split it like

X: "it 's very sleek looking with a very good front panel button layout , and it has a great feature set ."

Y: [("look", 3), ("panel button layout", 3), ("feature", 2)]
Vicky
  • 1,465
  • 2
  • 12
  • 21

3 Answers3

3

One option would be to capture everything after the beginning of the string or a comma, until [ and extracting the number after the [+:

>>> import re
>>> s = "look[+3],panel button layout[+3],feature[+2]it 's very sleek looking with a very good front panel button layout , and it has a great feature set . "
>>> re.findall(r"(?:^|,)(.*?)\[\+?(\-?\d+)\]", s)
[('look', '3'), ('panel button layout', '3'), ('feature', '2')]
>>>
>>> s = "darn diopter adjustment dial[-1]"
>>> re.findall(r"(?:^|,)(.*?)\[\+?(\-?\d+)\]", s)                                                            
[('darn diopter adjustment dial', '-1')]

where:

  • (?:^|,) is a non-capturing group that would either match the beginning of a string or a comma
  • (.*?) is a non-greedy match for any character any number of times
  • \[\+?(\-?\d+)\] would match an opening [ followed by an optional + followed by a capturing group that would capture one or more digits (with an optional - at the beginning), followed by a closing ]
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • I think `re.finditer()` would work better to answer the other part of the question (`X: "It's …"), but `re.sub()` would probably work just as well. – Kijewski Mar 14 '16 at 17:52
  • `darn diopter adjustment dial[-1]##the reason i rated it a four is because of that darn diopter adjustment dial. its very small and hard to turn so you can 't get an accurate adjustment ( for those of you who do n't know what a diopter adjustment is , it is to adjust the focus of the viewfinder to your eyesight . ) ` `[]` I find it not working in this case. – Vicky Mar 14 '16 at 17:59
  • Where did that `##` come from? – Chuck Mar 14 '16 at 18:01
  • You can ignore that. – Vicky Mar 14 '16 at 18:07
  • @Vicky please check the update now. It handles both + and - before the digits. – alecxe Mar 14 '16 at 18:13
0

You can use re.findall('(.*\[\+\d+\],?)', s) to get your desired Y output.

Chuck
  • 866
  • 6
  • 17
-1

try this regular expression:

([^\]]+[[^\]])+(.*)

your key/val pairs are in $1, the summary in $2.

EDIT: while re doesn't support multiple matches per group (only last capture is available), the new regex module does:

>>> m = regex.search(r"([^\]]+[[^\]])+(.*)", "look[+3],panel button layout[+3],feature[+2]it 's very sleek looking with a very good front panel button layout , and it has a great feature set . ")
>>> m.group(1)
',feature[+2]'
>>> m.captures(1)
['look[+3]', ',panel button layout[+3]', ',feature[+2]']
>>> m.group(2)
"it's very sleek looking with a very good front panel button layout , and it has a great feature set . "
Community
  • 1
  • 1
Scott Weaver
  • 7,192
  • 2
  • 31
  • 43
  • Did you look at the link you posted? The expression only finds the last key. – Kijewski Mar 14 '16 at 17:54
  • I get `(',feature[+2]', "it 's very sleek looking with a very good front panel button layout , and it has a great feature set . ")`. Am I not doing it right? I've tried several `re` functions, but I can't seem to get the right output with that regex. – zondo Mar 14 '16 at 17:54
  • python supports those match groups, the online tool doesn't - it works. – Scott Weaver Mar 14 '16 at 17:56
  • Then please post the proper invocation to extract `[("look", 3), ("panel button layout", 3), ("feature", 2)]`. – Kijewski Mar 14 '16 at 18:26