I am trying to write a program which reads a paragraph which counts the special characters and words
Let's focus on the goal then, rather than your approach. Your approach is possible probably possible but it may take a bunch of splits so let's just ignore it for now. Using re.findall
and a lengthy filter
ed regex should work much better.
lst = re.findall(r"\w+|[^\w\s]", some_sentence)
Would make sense. Broken down it does:
pat = re.compile(r"""
\w+ # one or more word characters
| # OR
[^\w\s] # exactly one character that's neither a word character nor whitespace
""", re.X)
results = pat.findall('"Why, hello there, Martha!"')
# ['"', 'Why', ',', 'hello', 'there', ',', 'Martha', '!', '"']
However then you have to go through another iteration of your list to count the special characters! Let's separate them, then. Luckily this is easy -- just add capturing braces.
new_pat = re.compile(r"""
( # begin capture group
\w+ # one or more word characters
) # end capturing group
| # OR
( # begin capture group
[^\w\s] # exactly one character that's neither a word character nor whitespace
) # end capturing group
""", re.X)
results = pat.findall('"Why, hello there, Martha!"')
# [('', '"'), ('Why', ''), ('', ','), ('hello', ''), ('there', ''), ('', ','), ('Martha', ''), ('', '!'), ('', '"')]
grouped_results = {"words":[], "punctuations":[]}
for word,punctuation in results:
if word:
grouped_results['words'].append(word)
if punctuation:
grouped_results['punctuations'].append(punctuation)
# grouped_results = {'punctuations': ['"', ',', ',', '!', '"'],
# 'words': ['Why', 'hello', 'there', 'Martha']}
Then just count your dict keys.
>>> for key in grouped_results:
print("There are {} items in {}".format(
len(grouped_results[key]),
key))
There are 5 items in punctuations
There are 4 items in words