Read a text file and return punctation as a string

Question

I want to create a program in Python which reads the text from a text file and returns a string where everything except punctuation (period, comma, colon, semicolon, exclamation point, question mark) has been removed. This is my code:

def punctuation(filename):
    
    with open(filename, mode='r') as f:
        s = ''
        punctations = '''.,;:!?'''
        for line in f:
            for c in line: 
                 if c == punctations:
                      c.append(s)
    return s

But it only returns '', I have also tried with s = + c instead of s.append(c) since append might not work on strings but the problem still remains. Does anyone want to help me find out why?

How it should work:

If we have a text file named hello.txt with the text "Hello, how are you today?" then punctation('hello.txt') should give us the output string ',?'

```if c == punctations:``` compares one char to the whole ```punctations ``` variable. You should replace this by ```if c in punctations:```. And i don't get the ```c.append(s)``` line. You'll get an error (c seems to be a string), and you want to add a char to s. — Metapod, Sep 27 '21 at 08:33
I guess `c == punctutations` will never be `True` since `c` is a character and `punctations` is a longer string. Would you try `if c in punctutations:` instead? — druskacik, Sep 27 '21 at 08:34

score 3 · Answer 1 · answered Sep 27 '21 at 08:38

You were comparing each character to the whole string when you should have been checking if it belonged in punctuations. Also, append was not the appropriate method here, because you were not returning a list instead you could concatenate the characters into s.

def punctuation(filename):
    
    with open(filename, mode='r') as f:
        s = ''
        punctations = '''.,;:!?'''
        text = f.read()
        words = text.split()
        for line in text:
            if line in set(punctations):
                s+=line
    return s

Another approach you could take to check if it's a symbol is the isalnum() method since it will consider all values that aren't characters or numbers incase you miss any symbols out.

if line!= " " and line!= "\n" and not line.isalnum():

To express distinct symbols the `set` is a great idea. The explanation of issues is helpful, it solves at conceptual level, e.g. "belonged _in_ punctuations" — hc_dev, Sep 27 '21 at 09:01
Thanks. Although I think the `isalnum()` method could be more suitable for this problem. — vnk, Sep 27 '21 at 09:03

score 1 · Accepted Answer · answered Sep 27 '21 at 08:38

The problem is that c == punctutations will never be True since c is a character and punctutations is a longer string. Another problem is that append doesn't work on strings, you should use + to concat strings instead.

def punctuation(filename):
    
    with open(filename, mode='r') as f:
        s = ''
        punctations = '''.,;:!?'''
        for line in f:
            for c in line: 
                 if c in punctations:
                      s += c
    return s

hc_dev · Answer 3 · 2021-09-27T20:53:50.217

Issues

Some statements seem to have issues:

if c == punctations: # 1
    c.append(s)  # 2

A single character is never equal to a string of many characters like your punctations (e.g. '.' == '.?' is never true). So we have to use a different boolean comparison-operator: in, because a character can be an element in a collection of characters, a string, list or set.
You spotted already: since c is a character and s a str , not lists we can not use method append. So we have to use s = s + c or shortcut s += c (your solution was almost right)

Extract a testable & reusable function

Why not extract and test the part that fails:

def extract_punctuation(line):
    punctuation_chars = set('.,;:!?') # typo in name, unique thus set
    symbols = []
    for char in line: 
        if char in punctuation_chars:
            symbols.append(char)
    return symbols


# test
symbol_list = extract_punctuation('Hello, how are you today?')
print(symbol_list)  # [',', '?']
print(''.join(symbol_list))  # ',?'

Solution: use a function on file-read

Then you could reuse that function on any text, or a file like:

def punctuation(filename):
    symbols = []
    with open(filename, mode='r') as f:
        symbols + extract_punctuation(f.read())
    return symbols.join()

Explained:

The default result is defined first as empty list [] (returned if file is empty).
The list of extracted is added to symbols using + for each file-read inside with block (here the whole file is read at once).
Returns either empty [].join() giving '' or not, e.g. ,?.

See: How do I concatenate two lists in Python?

Extend: return a list to play with

For a file with multiple sentences like dialogue.txt:

Hi, how are you?
Well, I am fine!
What about you .. ready to start, huh?

You could get a list (ordered by appearance) like: [',', '?', ',', '!', '.', '.', ',', '?'] which will result in a string with ordered duplicates: ,?,!..,?

To extend, a list might be a better return type:

Filter unique as set: set( list_punctuation(filename) )
Count frequency using pandas: pd.Series(list_punctuation(filename)).value_counts()

def list_punctuation(filename):
    with open(filename, mode='r') as f:
        return extract_punctuation(f.read())


lp = list_punctuation('dialogue.txt')

print(lp)
print(''.join(lp))

unique = set(lp)
print(unique)

# pass the list to pandas to easily do statistics
import pandas as pd
frequency = pd.Series(lp).value_counts()
print(frequency)

Prints above list, string. plus following set

{',', '?', '!', '.'}

as well as the ranked frequency for each punctuation symbol:

Today I learned - by playing with

punctuation & Python's data structures

Read a text file and return punctation as a string

3 Answers3

Issues

Extract a testable & reusable function

Solution: use a function on file-read

Extend: return a list to play with