I'm trying to split
string using regular expression
with python
and get all the matched literals.
RE: \w+(\.?\w+)*
this need to capture [a-zA-Z0-9_]
like stuff only.
but when I try to match and get all the contents from string, it doesn't return proper results.
Code snippet:
>>> import re
>>> from pprint import pprint
>>> pattern = r"\w+(\.?\w+)*"
>>> string = """this is some test string and there are some digits as well that need to be captured as well like 1234567890 and 321 etc. But it should also select _ as well. I'm pretty sure that that RE does exactly the same.
... Oh wait, it also need to filter out the symbols like !@#$%^&*()-+=[]{}.,;:'"`| \(`.`)/
...
... I guess that's it."""
>>> pprint(re.findall(r"\w+(.?\w+)*", string))
[' etc', ' well', ' same', ' wait', ' like', ' it']
it's only returning some of words, but actually it should return all the words, numbers and underscore(s)[as in linked example].
python version: Python 3.6.2 (default, Jul 17 2017, 16:44:45)
Thanks.