0

One of the project that I've been working on is to create a word counter, and to do that, I have to effectively remove all punctuation from a string.

I have tried using the split method and split at punctuation, however, this will later make the list very weird (from separating at a word to having a list that has 5 words). I then tried to have a list or a string full of punctuation, and use a for loop to eliminate all punctuation, but both are not successful

content_string = "This, is a test! to see: whether? or not. the code can eliminate punctuation"
punctuation = list["'", '"', ',', '.', '?', '!', ':', ';', '()']
for i in content_string.lower():
    if i in punctuation:
        i = i.replace[i," "]
    else:
        i = i

It says that

"TypeError: 'type' object is not subscriptable"

This message appears both when using a string or using a list.

Arkistarvh Kltzuonstev
  • 6,824
  • 7
  • 26
  • 56
  • 2
    Try changing your brackets in `replace` with parenthesis. – MattR Sep 13 '19 at 15:01
  • Your error comes from a syntax error in line 2. Check my answer for a solution and an efficient way to remove the punctution from your string –  Sep 13 '19 at 15:20

4 Answers4

1

There is a mix with parenthesis versus square brackets. list and replace are functions, arguments are passed with parenthesis.

Also, try to describe your algorithm with words: example:

For all forbidden characters, i want to remove them from my content (replace with space)

Here is an implementation you can start with:

content_string = "This, is a test! to see: whether? or not. the code can eliminate punctuation"
punctuation = ["'", '"', ',', '.', '?', '!', ':', ';', '(', ')']
for i in punctuation:
    content_string = content_string.replace(i, " ")
johan d
  • 2,798
  • 18
  • 26
  • 2
    This is the most straightforward solution I would recommend for beginners. – Romeo Valentin Sep 13 '19 at 15:05
  • But it's quite inefficient if your string is very large. –  Sep 13 '19 at 15:07
  • @SembeiNorimaki Inefficient? Compared to what ? This simple clear solution with a million time the content: tested at 595 ms The solution you suggest (the `''.join` solution), same content: 17886 ms. 30 times slower... – johan d Sep 17 '19 at 00:49
0

To create a list, you use l = [...] not l = list[...], and functions/methods (such as str.replace) are called with parenthesis, not square brackets, however, you can use re.sub to do this in a much better and simpler way:

content_string = "This, is a test! to see: whether? or not. the code can eliminate punctuation"

punctuation = ["'", '"', ',', '.', '?', '!', ':', ';', '(', ')'] # '(', ')' not `()`

import re

new_string = re.sub('|'.join(map(re.escape, punctuation)), '', content_string)

print(new_string)

Output:

This is a test to see whether or not the code can eliminate punctuation
DjaouadNM
  • 22,013
  • 4
  • 33
  • 55
0

Your error

"TypeError: 'type' object is not subscriptable"

comes from the line

punctuation = list["'", '"', ',', '.', '?', '!', ':', ';', '()']

To define a list you either use brackets [ ] without the keyword list, or if you use list you have to put parenthesis (although in this case converting a list into a list is redundant)

# both options will work, but the second one is redundant and therefore wrong
punctuation = ["'", '"', ',', '.', '?', '!', ':', ';', '(', ')']
punctuation = list(["'", '"', ',', '.', '?', '!', ':', ';', '(', ')'])

Notice that the last element () must be splitted in two elements ( and )

Now to achieve what you want in an efficient way, use a conditional comprehension list

''.join([i if i not in punctuation else ' ' for i in content_string])

result:

'This  is a test  to see  whether  or not  the code can eliminate punctuation'

Notice that according to your code, you are not removing the punctuation symbols but replacing them for spaces.

  • Your list comprehension is pretty slow. If you really want efficient, use `string.translate`. For deleting a set of characters, this method in my experience is by far the fastest. – iz_ Sep 17 '19 at 01:17
  • I just timed it: straight `replace` is 2.64 µs, `str.translate` is 1.73 µs, and `''.join` with a list comprehension is 29.7 µs. Making `punctuation` a set improves your method to 8.14 µs. In practical application, there is no difference. – iz_ Sep 17 '19 at 01:43
0

There are multiple bugs in the code.

First one:

The list keyword is obsolete. If you wanted to use it, you would need to add parentheses () so that the call would be properly done on the items in the already defined list.

BAD punctuation = list["'", '"', ',', '.', '?', '!', ':', ';', '()']

BETTER punctuation = list(["'", '"', ',', '.', '?', '!', ':', ';', '()'])

But simply defining the list with regular [] syntax would be enough, and also more efficient than a list() call.

Second one:

You will not be able to replace parentheses with the if i in punctuation: check. This is because they are a two character long string, and you are iterating over single characters of your string. So you will always compare '(' or ')' with '()'.

A possible fix - add parentheses separately to the punctuation list as single characters.

Third bug, or rather an obsolete else instruction:

    else:
        i = i

This servers no purpose whatsoever, you should skip the else instruction.

Fourth, the most apparent bug:

In your for loop you are editing i variable which is a copy of a single character from the string that you are iterating over. You should perform the change on the original string, this could be done with the usage of enumerate - only if you first turned your string into a list, so that you could modify its values.

for i, char in enumerate(list(content_string.lower())):
    if char in punctuation:
        content_string[i] = ' '

Anyway, the goal you are trying to achieve can come down to a one-liner, using a list comprehension and a string join on the resulting list afterwards:

content_string = ''.join([char if char not in punctuation else ' ' for char in content_string.lower()])

Epion
  • 458
  • 3
  • 7
  • 1
    Strings are immutable, so `content_string[i] = ' '` won't work. – tobias_k Sep 13 '19 at 15:41
  • Right, my bad. This could be possible if you called list() on your string, and then joined the elements back into a string. – Epion Sep 13 '19 at 15:46