1

I have a numpy ndarray with 6 elements:

['\tblah blah' '"""123' 'blah' '"""' '\t456' '78\t9']

I am trying to replace all tab characters \t with 4 spaces each so that the numpy array would now be:

[' blah blah' '"""123' 'blah' '"""' ' 456' '78 9']

I have considered re.sub but cannot figure out how to implement it when it comes down to an numpy ndarray. Any suggestions/help please?

1 Answers1

2

You could use NumPy's core.defchararray that deals with string related operations and for this case use replace method, like so -

np.core.defchararray.replace(arr,'\t', '    ')

Sample run -

In [44]: arr
Out[44]: 
array(['\tblah blah', '"""123', 'blah', '"""', '\t456', '78\t9'], 
      dtype='|S10')

In [45]: np.core.defchararray.replace(arr,'\t', '    ')
Out[45]: 
array(['    blah blah', '"""123', 'blah', '"""', '    456', '78    9'], 
      dtype='|S13')
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • Quick follow-up; is it possible to get the number of the replacements, i.e. in this case 3? –  Dec 04 '16 at 16:10
  • @nk-fford One solution to that would be : `np.core.defchararray.not_equal(output, arr).sum()`. – Divakar Dec 04 '16 at 17:30
  • Confused with what the `output` and `arr` is in this case? Could give a one-liner to explain how this count works please? –  Dec 04 '16 at 17:34
  • @nk-fford `output` would be the output from `np.core.defchararray.replace(arr,'\t', ' ')`? Basically we are counting the occurences where changes were made by the `replace` method. – Divakar Dec 04 '16 at 17:35
  • I see, I was doing everything in one line and got confused. Thanks –  Dec 04 '16 at 17:37