I made two functions which count punctuations in a text file. These punctuations are comma, apostrophe, hyphen and semi-colon. Counting comma and semi-colon is quite straight forward, but counting apostrophe and hyphen is a little bit more complicated as there are certain rules I must follow (according to my assignment) e.g. I can only count an apostrophe if it is between two letters like in shouldn't, won't etc. So I split this task up into two functions: countpunc1()
and countpunc2()
.
At the end of both of these, I return a dictionary that has the count of these punctuations.
Then in the function main()
, I want to be able to return a dictionary that has both the results from countpunc1
and countpunc2
combined into one key: punctuations
.
Example shown at the bottom.
Here is my code:
def countpunc1(text):
for ch in '0123456789abcdefghijklmnopqrstuvwxyz!"#$%&()*+./:<=>?@[\\]^_`{|}~-':
text = text.replace(ch, '')
words = text.replace('--', '').replace("'", '').split()
wordlist = list(words)
dictToReturn = {}
punctuations = [',', ';']
punclist = list(i for i in wordlist if i in punctuations)
for x in range(len(punctuations)):
dictToReturn[punctuations[x]] = dictToReturn.get(x,0)
for p in punclist:
dictToReturn[p] = dictToReturn.get(p,0) + 1
return dictToReturn
def countpunc2(text):
for ch in '!"#$%&()*+./:<=>?@[\\]^_`{|}~':
text = text.replace(ch, ' ')
words = text.replace('--', ' ').split('\n')
wordlist = str(words)
punctuations = "'-"
dictToReturn = {}
letters = "abcdefghijklmnopqrstuvwxyz"
for i, char in enumerate(wordlist):
if i < 1:
continue
if i > len(wordlist) - 2:
continue
if char in punctuations:
if char not in dictToReturn:
dictToReturn[char] = 0
if wordlist[i-1] in letters and wordlist[i+1] in letters:
dictToReturn[char] += 1
return dictToReturn
def main(text):
text = open(text, 'r').read().lower()
profileDict = {}
# profileDict[punctuations] = ??
return profileDict
In the second last line above that is commented, I tried doing things like:
profileDict[punctuations] = countpunc1(text) + countpunc2(text)
and
profileDict[punctuations] = countpunc1(text).items() + countpunc2(text).items()
Clearly all of these are wrong and I get an TypeError: unsupported operand type(s)
.
Expected result is something like this:
E.g: dict[punctuations] = {",": 9, "'" : 0, ";" : 4, "-" : 11}
PS. the function themselves work fine as I tested them on multiple text files.