Check if a word contains a number

Question

I am tokenizing a string into words and then want to remove any word which contains a number.

tokens = ['hello', 'world', '12', '1-3', '23'']

As you can see, the numbers come in various forms. The above three are just examples. I can loop through the string items and see if there is a digit and remove that string. However, that doesn't seem right.

The isdigit() function doesn't work on such number-strings. How can I achieve this?

Goal: Any token which contains a digit should be removed. my current code is something like this which doesn't handle the above types:

relevant_tokens = [token for token in tokens if not token.isdigit()]

[`relevant_tokens = [token for token in tokens if not any(c.isdigit() for c in token)]`](https://ideone.com/WYIxED)? — Wiktor Stribiżew, Oct 16 '17 at 10:37
This can help you : https://stackoverflow.com/q/30141233/5596800 — xssChauhan, Oct 16 '17 at 10:37
import re; result = [token for token in tokens if len(re.findall("\d+", token))==0] — Kinght 金, Oct 16 '17 at 10:43
@WiktorStribiżew that works and I mentioned that approach in the question when I said: "I can loop through the string item". However, it makes my filter statement too complex. I was more looking for a single function. — utengr, Oct 16 '17 at 10:50
Ok, the first thread linked actually contains the right regex solution, `re.search(r'\d', inputString)`. Do not use the `re.match('.*\d+', token)` solution below, it will cause unnecessary backtracking and slow down. — Wiktor Stribiżew, Oct 16 '17 at 10:51

MohitC · Answer 1 · 2017-10-16T10:46:51.790

0

import re
tokens = [token for token in tokens if not re.match('.*\d+', token)]

edited Oct 16 '17 at 10:46

answered Oct 16 '17 at 10:42

MohitC

4,541
2
34
55

`re.match('\d+', token)` won't detect `abc5`. – Wiktor Stribiżew Oct 16 '17 at 10:43
Fixed, @WiktorStribiżew – MohitC Oct 16 '17 at 10:47
@MohitC please update your answer based on the suggestions in the comments above so I can accept it, especially from Wiktor. – utengr Oct 20 '17 at 11:48
@engr_s simply `\d` will also detect `abc5` as a valid token which you dont want. – MohitC Oct 23 '17 at 07:59

Check if a word contains a number

1 Answers1