-2

I want to extract the numbers for each parameter below:

import re

parameters = '''
             NO2: 42602
             SO2: 42401
             CO: 42101
             '''

The desired output should be:['42602','42401','42101']

I first tried re.findall(r'\d+',parameters), but it also returns the "2" from "NO2" and "SO2".

Then I tried re.findall(':.*',parameters), but it returns [': 42602', ': 42401', ': 42101']

If I can not rename the "NO2" to "Nitrogen dioxide", is there a way just to collect numbers on the right (after ":")? Many thanks.

Jeremy
  • 849
  • 6
  • 15

5 Answers5

0
re.findall(r'(?<=:\s)\d+', parameters)

Should work. You can learn more about look-behind from here.

bayramkazik
  • 27
  • 1
  • 9
0

You can use the following regex to capture the numbers

^\s*\w+:\s(\d+)$

Hereby, ^ in the beginning asserts the position at the start of the line. \s* means that there may be 0 or more whitespaces before the content. \w+:\s matches a word character followed by ":" and space, that is "NO2: ". Finally, (\d+) matches the following digits you want as a group. $ matches the end of the line.

To get all the matches as a list you can use

matches = re.findall(r'^\s*\w+:\s(\d+)$', parameters, re.MULTILINE)

As re.MULTILINE is specified,

the pattern character '^' matches at the beginning of the string and at the beginning of each line.

as stated in the docs.

The result is as follows

>> print(matches)
['42602', '42401', '42101']
Henry Harutyunyan
  • 2,355
  • 1
  • 16
  • 22
0

If you do not want to use capturing groups, you could use look behind.

(?<=:\s)\d+

Details:

  • (?<=:\s): gets string after :\s
  • \d+: gets digits

I also tried result on python.

import re
parameters = '''
             NO2: 42602
             SO2: 42401
             CO: 42101
             '''

result = re.findall(r'(?<=:\s)\d+',parameters)
print (result)

Result

['42602', '42401', '42101']
Thân LƯƠNG Đình
  • 3,082
  • 2
  • 11
  • 21
0

To put my two cents in, you could simpley use

re.findall(r'(\b\d+\b)', parameters)

See a demo on regex101.com.


If you happen to have other digits floating around somewhere in your string, be more precise with

\w+:\s*(\d+)

See another demo on regex101.com.

Jan
  • 42,290
  • 8
  • 54
  • 79
0

You just need to specify where in your string do you want to search for digits, you can use:

re.findall(r': (\d+)', parameters)

This tells Python to look for digits in the part of the string after ":" and the "space".

LEE
  • 316
  • 2
  • 8