0

I want to remove punctuation from strings in a series.

I am using python3.6 and maketrans(), translate() functions to do this. However, it does not give me the results I want.

Here are two sentences before code:

Baking cake of straw-bana-choco will take longer than expcted


Please include as much of the following data that is available.< >< >- Cake Type:< >- Flavors:< >- Decoration Type:< >- Icing:< >-

Here is my code:

remove_punc = str.maketrans(' ', ' ', string.punctuation)
df.Summary = df.Summary.str.translate(remove_punc)
df.Description = df.Description.str.translate(remove_punc)

Sentences after the code:

baking cake of strawbanachoco will take longer than expcted


please include as much of the following data that is available   cake type flavors decoration type icing

So I am wondering why strawbanachoco is not staw bana choco, it seems the code is not replacing the - with a space . Whereas in the second setences, it seems to be replacing the punctuations with spaces.

I did not include in the code snippet above, but I also lowercased all of my sentences.

Any suggestions on why this might be happening?

Thanks

cool_beans
  • 131
  • 1
  • 5
  • 15

2 Answers2

1

It's not replacing it with spaces in the second sentence. You have spaces in your original string between the punctuation characters which are simply being preserved.

See https://docs.python.org/3/library/stdtypes.html#str.maketrans for details on how this works.

dkamins
  • 21,450
  • 7
  • 55
  • 59
0

If you want to replace each punctuation character with space:

s = """
Baking cake of straw-bana-choco will take longer than expcted
Please include as much of the following data that is available.< >< >- Cake Type:< >- Flavors:< >- Decoration Type:< >- Icing:< >-
"""

remove_punc = str.maketrans(dict.fromkeys(string.punctuation, ' '))
print(str.translate(s, remove_punc))

Out:

Baking cake of straw bana choco will take longer than expcted
Please include as much of the following data that is available         Cake Type      Flavors      Decoration Type      Icing     

And there's a very good overview of other methods here: Fast punctuation removal with pandas

perl
  • 9,826
  • 1
  • 10
  • 22