0

Write the function list_of_words that takes a list of strings as above and returns a list of individual words with all white space and punctuation removed (except for apostrophes/single quotes).

My code removes periods and spaces, but not commas or exclamation points.

def list_of_words(list_str):
    m = []
    for i in list_str:
        i.strip('.')
        i.strip(',')
        i.strip('!')
        m = m+i.split()
    return m

print(list_of_words(["Four score and seven years ago, our fathers brought forth on",
  "this continent a new nation, conceived in liberty and dedicated",
  "to the proposition that all men are created equal.  Now we are",
  "   engaged in a great        civil war, testing whether that nation, or any",
  "nation so conceived and so dedicated, can long endure!"])
Ozgur Vatansever
  • 49,246
  • 17
  • 84
  • 119
anonymous fox
  • 39
  • 1
  • 5

5 Answers5

2

One of the easiest way to clear some punctuation marks and multiple whitespaces would be using re.sub function.

import re

sentence_list = ["Four score and seven years ago, our fathers brought forth on",
                 "this continent a new nation, conceived in liberty and dedicated",
                 "to the proposition that all men are created equal.  Now we are",
                 "   engaged in a great        civil war, testing whether that nation, or any",
                 "nation so conceived and so dedicated, can long endure!"]

sentences = [re.sub('([,.!]){1,}', '', sentence).strip() for sentence in sentence_list]
words = ' '.join([re.sub('([" "]){2,}', ' ', sentence).strip() for sentence in sentences])

print words
"Four score and seven years ago our fathers brought forth on this continent a new nation conceived in liberty and dedicated to the proposition that all men are created equal Now we are engaged in a great civil war testing whether that nation or any nation so conceived and so dedicated can long endure"
Ozgur Vatansever
  • 49,246
  • 17
  • 84
  • 119
1

strip returns the string, you should catch and apply the remaining strips. so your code should be changed to

for i in list_str:
    i = i.strip('.')
    i = i.strip(',')
    i = i.strip('!')
    ....

on second note, strip removes the mentioned characters only on start and end of strings. If you want to remove characters in-between the string, you should consider replace

venpa
  • 4,268
  • 21
  • 23
1

You could use regular expressions, as explained in this question. Essentially,

import re

i = re.sub('[.,!]', '', i)
Community
  • 1
  • 1
nblivingston
  • 91
  • 1
  • 5
0

As suggested before, you need to assign the i.strip() to i. And as mentioned before, the replace method is better. Here is an example using the replace method:

def list_of_words(list_str:list)->list:
    m=[]
    for i in list_str:
        i = i.replace('.','')
        i = i.replace(',','')
        i = i.replace('!','')
        m.extend(i.split())
    return m

print(list_of_words([ "Four score and seven years ago, our fathers brought forth on",
  "this continent a new nation, conceived in liberty and dedicated",
  "to the proposition that all men are created equal.  Now we are",
  "   engaged in a great        civil war, testing whether that nation, or any",
  "nation so conceived and so dedicated, can long endure! ])

As you can notice, I have also replaced m=m+i.split() with m.append(i.split()) to make it easier to read.

jkd
  • 1,045
  • 1
  • 11
  • 27
0

It would be better not to rely on your own list of punctuation, but use python's one and as others have pointer, use regex to remove chars:

punctuations = re.sub("[`']", "", string.punctuation)
i = re.sub("[" + punctuations + "]", "", i)

There's also string.whitespace, although split does take care of them for you.

Eran
  • 2,324
  • 3
  • 22
  • 27