Issues
Some statements seem to have issues:
if c == punctations: # 1
c.append(s) # 2
- A single character is never equal to a string of many characters like your
punctations
(e.g. '.' == '.?'
is never true). So we have to use a different boolean comparison-operator: in
, because a character can be an element in a collection of characters, a string, list or set.
- You spotted already: since
c
is a character and s
a str , not lists we can not use method append
. So we have to use s = s + c
or shortcut s += c
(your solution was almost right)
Extract a testable & reusable function
Why not extract and test the part that fails:
def extract_punctuation(line):
punctuation_chars = set('.,;:!?') # typo in name, unique thus set
symbols = []
for char in line:
if char in punctuation_chars:
symbols.append(char)
return symbols
# test
symbol_list = extract_punctuation('Hello, how are you today?')
print(symbol_list) # [',', '?']
print(''.join(symbol_list)) # ',?'
Solution: use a function on file-read
Then you could reuse that function on any text, or a file like:
def punctuation(filename):
symbols = []
with open(filename, mode='r') as f:
symbols + extract_punctuation(f.read())
return symbols.join()
Explained:
- The default result is defined first as empty list
[]
(returned if file is empty).
- The list of extracted is added to
symbols
using +
for each file-read inside with
block (here the whole file is read at once).
- Returns either empty
[].join()
giving ''
or not, e.g. ,?
.
See:
How do I concatenate two lists in Python?
Extend: return a list to play with
For a file with multiple sentences like dialogue.txt
:
Hi, how are you?
Well, I am fine!
What about you .. ready to start, huh?
You could get a list (ordered by appearance) like:
[',', '?', ',', '!', '.', '.', ',', '?']
which will result in a string with ordered duplicates:
,?,!..,?
To extend, a list might be a better return type:
- Filter unique as set:
set( list_punctuation(filename) )
- Count frequency using pandas:
pd.Series(list_punctuation(filename)).value_counts()
def list_punctuation(filename):
with open(filename, mode='r') as f:
return extract_punctuation(f.read())
lp = list_punctuation('dialogue.txt')
print(lp)
print(''.join(lp))
unique = set(lp)
print(unique)
# pass the list to pandas to easily do statistics
import pandas as pd
frequency = pd.Series(lp).value_counts()
print(frequency)
Prints above list, string. plus following set
{',', '?', '!', '.'}
as well as the ranked frequency for each punctuation symbol:
, 3
? 2
. 2
! 1
Today I learned - by playing with
- punctuation & Python's data structures