Shortest way to replace parts of strings in NumPy array

Question

I have a NumPy string array

['HD\,315', 'HD\,318' ...]

I need to replace every 'HD\,' to 'HD ', i.e. I want to get new array like below

 ['HD 315', 'HD 318' ...]

What is the SHORTEST way to solve this task in Python? Is it possible to do this without FOR loop?

Is `'HD\,315' 'HD\,318'` is single string? or there should be a `,` as `'HD\,315', 'HD\,318'` — Grijesh Chauhan, Jan 27 '14 at 16:13
thanks @GrijeshChauhan ! This is what I need (misprint 'replce') — drastega, Jan 27 '14 at 16:22
@user2579566 ok I notice and based on that I posted my answer. — Grijesh Chauhan, Jan 27 '14 at 16:22

score 14 · Accepted Answer · edited Sep 13 '17 at 12:38

14

Use python list comprehension:

L = ['HD\,315', 'HD\,318' ]
print [s.replace('HD\,' , 'HD ') for s in L]

But it uses for

Alternatively you can use map():

print map(lambda s: s.replace('HD\,' , 'HD '), L)

for python3 use list(map(lambda s: s.replace('HD\,' , 'HD '), L))

edited Sep 13 '17 at 12:38

famargar

3,258
6
28
44

answered Jan 27 '14 at 16:19

Grijesh Chauhan

57,103
20
141
208

If I had a list of values to replace e.g. ["HD\", "HA\", "AB\], what's the most efficient way to iterate using the list comprehension you stated? – Howeitzer Sep 04 '18 at 08:59

user2750362 · Answer 2 · 2015-05-27T13:11:19.800

8

You can use the numpy.core.defchararray.replace function, which performs the for operation using numpy instead:

import numpy.core.defchararray as np_f

data = ['HD\,315', 'HD\,318']

new_data = np_f.replace(data, 'HD\,', 'HD')

edited May 27 '15 at 13:11

answered May 26 '15 at 15:04

user2750362

382
3
6

5

Now all you need is `np.char.replace`. – hpaulj May 26 '15 at 16:30

Eelco Hoogendoorn · Answer 3 · 2014-01-27T17:04:33.783

Something along these lines will work if your strings are of fixed length, and your array is of type string, and not of type object. Note that it circumvents python for loops, such as encountered in a list comprehension, and is correspondingly much faster:

import numpy as np
data = np.array(['HD\,315', 'HD\,318'])

view = data.view(np.uint8).reshape(data.shape + (data.dtype.itemsize,))
view[:,2] = 32
print data

Of course if your commas may appear in various places, logical indexing would be required (ie, view[view==92] = 32). Even if your strings are not all exactly equal length, but at least of a short and bounded length, placing them in a fixed length array could speed things up a lot at the cost of some extra memory, if you have a lot of these strings. Note that numpy.char contains a lotof useful utility functions for vectorized string manipulations. Speaking of which...

np.char.replace(data, ',', ' ')

Shortest way to replace parts of strings in NumPy array

3 Answers3

Linked