1

I would like to extract only the numbers contained in a string. Can isdigit() and split() be combined for this purpose or there is simpler/faster way?

Example:

m = ['How to extract only number 122', 'The number 35 must be extracted', '1052 must be extracted']

Output:

numbers = [122, 35, 1052]
text = ['How to extract only number', 'The number must be extracted', 'must be extracted']

My code:

text = []
numbers = []
temp_numbers = []
for i in range(len(m)):
    text.append([word for word in m[i].split() if not word.isdigit()])
    temp_numbers.append([int(word) for word in m[i].split() if word.isdigit()])
for i in range(len(m)):
    text[i] = ' '.join(text[i])
for elem in temp_numbers:
    numbers.extend(elem)

print(text)
print(numbers)
Lee
  • 169
  • 1
  • 9
  • You could omit `==True` and `==False` and factor out the common `for word in m[i].split() if word.isdigit()` but other than that this looks as simple as it can get. – mkrieger1 Aug 29 '22 at 16:03
  • This has been address here: https://stackoverflow.com/questions/19715303/regex-that-accepts-only-numbers-0-9-and-no-characters – JDR Aug 29 '22 at 16:07

3 Answers3

2

Import regex library:

import re

If you want to extract all digits:

numbers = []
texts = []
for string in m:
    numbers.append(re.findall("\d+", string))
    texts.append(re.sub("\d+", "", string).strip())

If you want to extract only first digit:

numbers = []
texts = []
for string in m:
    numbers.append(re.findall("\d+", string)[0])
    texts.append(re.sub("\d+", "", string).strip())
funnydman
  • 9,083
  • 4
  • 40
  • 55
BloomShell
  • 833
  • 1
  • 5
  • 20
1

So if we take m as a list you can just loop through it and check if the current char is a digit then if so append it.

For loop solution:

m = ['How to extract only number 122', 'The number 35 must be extracted', '1052 must be extracted']

numbers = []
temp_num = ""

for string in m:
    # Presuming m only contains strings

    for char in string:
        if char.isdigit():
            temp_num += char
    
    numbers.append(int(temp_num))
    temp_num = ""

List comprehension solution - appends each number at different indexes:

m = ['How to extract only number 122', 'The number 35 must be extracted', '1052 must be extracted']

numbers = [int(char) for string in m for char in string if char.isdigit()]

Hope this helped, also if you want to only get the values of an iterable (e.g. a list) just use for varname in iterable it's faster and cleaner.

If you need both index and the value, use for index, varname in enumerate(iterable).

xihtyM
  • 240
  • 1
  • 2
  • 9
0
nums_list = []
m = ["How to extract only number 122", "The number 35 must be extracted", "1052 must be extracted"]
for i in m:
    new_l = i.split(" ")
    for j in new_l:
        if j.isdigit():
            nums_list.append(int(j))
print nums_list

OP:

[122, 35, 1052]
Aman Raheja
  • 615
  • 7
  • 16