I have a list with a lot of words, so I don't want to write a nested loop, 'cause it will take a lot of time for the program to run. So maybe there is a way to check whether the word consists of punctuation, something like function any(map(str.isdigit, s1))
isdigits when we have to check numbers?

- 87
- 1
- 6
-
2`any` still has to loop through the string. Some looping is unavoidable. – khelwood Apr 23 '21 at 12:09
-
Could you clarify more ? You have to check if there is a punctuation in the list of words ? – Aditya Singh Rathore Apr 23 '21 at 12:10
-
@AdityaSinghRathore to check every word in a list for punctuation – jamesss Apr 23 '21 at 12:12
-
There should be a loop. check this : https://stackoverflow.com/a/4843172/4688639 – Soroosh Noorzad Apr 23 '21 at 12:13
-
explain " consists of punctuation" with an example, it's not obivous what you mean – 576i Apr 23 '21 at 12:27
-
@576i e.g. word 'hi!' or 'yes,' – jamesss Apr 23 '21 at 12:30
2 Answers
Unless the list is very large, or your CPU is low-performance, it is not going to take much time to process a list of words. Consider the example below, which has 1 million 20-character strings.
import random
import string
In [16]: s = [''.join(random.choices(string.ascii_letters + string.punctuation, k=20)) for _ in range(1000000)]
In [17]: %%timeit -n 3 -r 3
...: [any(map(str.isdigit, s1)) for s1 in s]
...:
...:
1.23 s ± 2.53 ms per loop (mean ± std. dev. of 3 runs, 3 loops each)
In [18]: %%timeit -n 3 -r 3
...: [any([s2 in string.punctuation for s2 in s1]) for s1 in s]
...:
...:
1.72 s ± 18.1 ms per loop (mean ± std. dev. of 3 runs, 3 loops each)
You could speed it up with a regular expression
import re
import string
In [16]: s = [''.join(random.choices(string.ascii_letters + string.punctuation, k=20)) for _ in range(1000000)]
In [17]: patt = re.compile('[%s]' % re.escape(string.punctuation))
In [18]: %%timeit -n 3 -r 3
[bool(re.match(patt, s1)) for s1 in s]
1.03 s ± 3.23 ms per loop (mean ± std. dev. of 3 runs, 3 loops each)

- 2,970
- 1
- 16
- 21
It may depend on what you define as "punctuation". The module string
defines string.punctuation
as '!"#$%&\'()*+,-./:;<=>?@[\\]^_``{|}~'
. You may also define it as "what isn't alphanumeric" (a-zA-Z0-9), or "what isn't alpha" (a-zA-Z).
Here I define a very long string of alphanumeric characters, and the same with an added dot .
, shuffled.
import numpy as np
import string
mystr_no_punct = np.random.choice(list(string.ascii_letters) +
list(string.digits), 1e8)
mystr_withpunct = np.append(mystr_no_punct, '.')
np.random.shuffle(mystr_no_punct)
mystr_withpunct = "".join(mystr_withpunct)
mystr_no_punct = "".join(mystr_no_punct)
Below is an implementation of the naive iteration with a for loop, and some possible answers, according to what you look for, with time comparisons
def naive(mystr):
for x in mystr_no_punct:
if x in string.punctuation:
return False
return True
# naive solution
%timeit naive(mystr_withpunct)
%timeit naive(mystr_no_punct)
# check if string is only alnum
%timeit str.isalnum(mystr_withpunct)
%timeit str.isalnum(mystr_no_punct)
# reduce to a set of the present characters, compare with the set of punctuation characters
%timeit len(set(mystr_withpunct).intersection(set(string.punctuation))) > 0
%timeit len(set(mystr_no_punct).intersection(set(string.punctuation))) > 0
# use regex
import re
%timeit len(re.findall(rf"[{re.escape(string.punctuation)}]+", mystr_withpunct)) > 0
%timeit len(re.findall(rf"[{re.escape(string.punctuation)}]+", mystr_no_punct)) > 0
With the following results
# naive
53.9 ms ± 928 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
53.1 ms ± 261 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# str.isalnum
4.17 ms ± 25.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
4.47 ms ± 135 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# sets intersection
8.26 ms ± 21.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
8.2 ms ± 48.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# regex
8.43 ms ± 84 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
8.51 ms ± 60.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
So using the built-in isalnum
is clearly the fastest. But if you have specific needs, regex or sets intersection seem a good fit.

- 392
- 2
- 12