5

I'm trying to extract the number before character "M" in a series of strings. The strings may look like:

"107S33M15H"
"33M100S"
"12M100H33M"

so basically there would be a sets of numbers separated by different characters, and "M" may show up more than once. For the example here, I would like my code to return:

33
33
12,33 #doesn't matter what deliminator to use here

One way I could think of is to split the string by "M", and find items that are pure numbers, but I suspect there are better ways to do it. Thanks a lot for the help.

Helene
  • 953
  • 3
  • 12
  • 22

2 Answers2

19

You may use a simple (\d+)M regex (1+ digit(s) followed with M where the digits are captured into a capture group) with re.findall.

See IDEONE demo:

import re
s = "107S33M15H\n33M100S\n12M100H33M"
print(re.findall(r"(\d+)M", s))

And here is a regex demo

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
2

You can use rpartition to achieve that job.

s = '107S33M15H'    
prefix = s.rpartition('M')[0]
Camilo
  • 335
  • 5
  • 7
  • I used this to add a new column to my data frame. This is the code: df['new_col'] = df.old_col.str.rpartition('b')[2] # Where b is the letter to be removed and 2 is the position in the 'rpartition' array of the characters you want in the new column. Thanks for the code. was very useful. – Jorge Jan 25 '18 at 15:03