Extract a substring from a string including a comma

Question

I have a list of strings in a file. I am trying to extract a substring from each string and printing them. The strings look like the following -

Box1 is lifted\nInform the manufacturer
Box2 is lifted\nInform the manufacturer
Box3, Box4 is lifted\nInform the manufacturer
Box5, Box6 is lifted\nInform the manufacturer
Box7 is lifted\nInform the manufacturer

From each line I have to extract the string before \n and print them. I used the following Python regex to do that - term = r'.*-\s([\w\s]+)\\n' This regex works fine for the 1st, 2nd and last line. But it doesn't work for the 3rd and 4th lines since there is a , in the string. How should I modify my regex expression to fit in that?

Expected results -

Box1 is lifted
Box2 is lifted
Box3 Box4 is lifted
Box5 Box6 is lifted
Box7 is lifted

Results obtained currently -

Box1 is lifted
Box2 is lifted
Box2 is lifted
Box2 is lifted
Box7 is lifted

Possible duplicate of [How can I split and parse a string in Python?](https://stackoverflow.com/questions/5749195/how-can-i-split-and-parse-a-string-in-python) — max, Nov 29 '17 at 19:28
Do the strings contain newline characters, or do they contain a literal "\" followed by "n"? Your regex seems to suggest the latter, but a lot of the answers you've got are assuming the former. — ekhumoro, Nov 29 '17 at 19:40

Aaron Lael · Answer 1 · 2017-11-29T19:23:40.490

2

If this is a consistent format, you could just split on the newline:

''.join(YOURSTRING.split('\n')[0].split(','))

Edited because I missed the part about removing the comma.

edited Nov 29 '17 at 19:23

answered Nov 29 '17 at 19:16

Aaron Lael

188
7

But I also want to remove the `comma` while printing the output – Gargi Nov 29 '17 at 19:17
Then split it again on a `,`. But avoid doing that, time to practice and learn regex. – theBrainyGeek Nov 29 '17 at 19:19
Sorry, I honestly missed that so I updated, but as that's more complicated I would suggest going back to regex. – Aaron Lael Nov 29 '17 at 19:24

Pulsar · Answer 2 · 2017-11-29T19:44:58.143

2

regex is overkill for basic string operations like this. Use the built-in string methods, like partition and replace:

for line in lines:
    first, sep, last = line.partition('\n')
    newline = first.replace(',','')
    print (newline)

Edit. In case \n is a literal sequence in a line read from a file, use r'\n' instead of '\n'.

edited Nov 29 '17 at 19:44

answered Nov 29 '17 at 19:22

Pulsar

288
1
5

The OP is reading the strings from a file. By definition, a line cannot contain a newline, so your code cannot possibly work. – ekhumoro Nov 29 '17 at 19:34

score 2 · Answer 3 · answered Nov 29 '17 at 19:23

2

The comma isn't part of either \W or \s character set.term = r'.*-\s([\w\s,]+)\\n' should do what you want.

answered Nov 29 '17 at 19:23

Mateo

1,781
1
16
21

theBrainyGeek · Answer 4 · 2017-11-29T19:23:50.847

1

Why not something as simple as term = r"[*]*(is lifted)". Or don't use regex at all if not required. EDIT: I think this might be better term = r"(Box[0-9])?(, Box[0-9])*(is lifted)"

edited Nov 29 '17 at 19:23

answered Nov 29 '17 at 19:17

theBrainyGeek

584
1
6
17

score 1 · Answer 5 · answered Nov 29 '17 at 19:19

What about something like this? :

from io import StringIO

ok = '''Box1 is lifted\\nInform the manufacturer
Box2 is lifted\\nInform the manufacturer
Box3, Box4 is lifted\\nInform the manufacturer
Box5, Box6 is lifted\\nInform the manufacturer
Box7 is lifted\\nInform the manufacturer
'''
ok = StringIO(ok)
strings = [' '.join(x.split()).replace('\\n', '').replace(',', '') for x in ok.split('Inform the manufacturer')]
>>> for x in strings: print x
... 
... 
Box1 is lifted
Box2 is lifted
Box3 Box4 is lifted
Box5 Box6 is lifted
Box7 is lifted

score 0 · Answer 6 · answered Nov 29 '17 at 19:33

Let me know if the below works for you.

input="Box3, Box4 is lifted\nInform the manufacturer"
input=input.replace(",","",1)
print(input)
print(input[0:input.index("\n")])
input="Box1 is lifted\nInform the manufacturer"
print(input[0:input.index("\n")])

score 0 · Answer 7 · answered Nov 29 '17 at 20:11

You can try regex and can capture the group:

One line solution:

import re
pattern=r'\w.+(?=\\n)'

print([re.search(pattern,line).group() for line in open('file','r')])

output:

['Box1 is lifted', 'Box2 is lifted', 'Box3, Box4 is lifted', 'Box5, Box6 is lifted', 'Box7 is lifted']

Detailed solution:

import re
pattern=r'\w.+(?=\\n)'
with open('newt','r') as f:
    for line in f:
        print(re.search(pattern,line).group())

output:

Box1 is lifted
Box2 is lifted
Box3, Box4 is lifted
Box5, Box6 is lifted
Box7 is lifted

Extract a substring from a string including a comma

7 Answers7