1

The string package is useful for stripping punctuation from individual strings as demonstrated below:

import string
stripPunct = str.maketrans('', '', string.punctuation)

word = 'foo.bar.baz'

word.translate(stripPunct)

Output: 'foobarbaz'

But what is the method to apply this exact same method to every string in a numpy array of strings?

myArr =   np.array(['foo.bar.baz', 'foo.bar.baz', 'foo.bar.baz'], dtype='<U15')


myArr.translate(stripPunct)
AttributeError: 'numpy.ndarray' object has no attribute 'translate'
iskandarblue
  • 7,208
  • 15
  • 60
  • 130
  • Apologies, I edited the question – iskandarblue Sep 03 '20 at 21:14
  • Is `np.array` -> `list` -> `map` or list comprehension -> `np.array` an option for you? – Jan Stránský Sep 03 '20 at 21:16
  • yes those are good suggestions – iskandarblue Sep 03 '20 at 21:17
  • Also have a look at [this post](https://stackoverflow.com/questions/35215161/most-efficient-way-to-map-function-over-numpy-array) – Jan Stránský Sep 03 '20 at 21:19
  • Iterate? `res = [words.translate(stripPunct) for words in myArr] ` – wp78de Sep 03 '20 at 21:19
  • What's the real-world size of your array? For small arrays like your sample, list comprehension tends to be best, but `np.vectorize` gets better when the array is large (though a pure list solution is best both small and large). `numpy` does not have specialized string handling code. Even `np.char.translate` uses the Python `translate`. – hpaulj Sep 03 '20 at 22:27

2 Answers2

2
import string
import numpy as np

stripPunct = str.maketrans('', '', string.punctuation)

myArr =  np.array(['foo.bar.baz', 'foo.bar.baz', 'foo.bar.baz'])
# works for 'any.string.inputted'
new = np.array([i.translate(stripPunct) for i in myArr])

Output:

array(['foobarbaz', 'foobarbaz', 'foobarbaz'])
neuops
  • 342
  • 3
  • 16
1

You can use np.vectorize to make a vectorized function.


stripPunct=str.maketrans('', '', string.punctuation)
vecTrans=np.vectorize(lambda x:x.translate(stripPunct))
myArr=np.array(['foo.bar.baz', 'foo.bar.baz', 'foo.bar.baz'], dtype='<U15')

vecTrans(myArr)

>>>return: array(['foobarbaz', 'foobarbaz', 'foobarbaz'], dtype='<U9')

fusion
  • 1,327
  • 6
  • 12