How do I replace all values not ending in 00 with an empty string in a numpy string array?

Question

I have the following numpy array:

array(['00:00', '00:05', '00:15', '00:20', '00:25', '00:30', '00:35',
       '00:40', '00:45', '00:50', '00:55', '01:00', '01:05', '01:10',
       '01:15', '01:20', '01:40', '01:45', '01:55', '02:05', '02:10',
       '02:15', '02:35', '02:40', '02:45', '02:55', '03:05', '03:10',
       '03:30', '03:55', '04:00', '04:05', '04:25', '04:40', '04:55',
       '05:00', '05:05', '05:15', '05:20', '05:25', '05:30', '05:35',
       '05:50', '05:55', '06:05', '06:20', '06:25', '06:30', '06:35',
       '06:45', '06:50', '07:05', '07:15', '07:30', '07:40', '07:45',
       '07:50', '07:55', '08:10', '08:20', '08:25', '08:40', '08:45',
       '08:50', '09:15', '09:20', '09:45', '09:50', '09:55', '10:10',
       '10:15', '10:25', '10:30', '10:45', '10:50', '11:00', '11:05',
       '11:15', '11:25', '11:35', '11:45', '11:50', '11:55', '12:00',
       '12:10', '12:15', '12:25', '12:50', '12:55', '13:00', '13:40',
       '13:45', '13:50', '14:00', '14:10', '14:20', '14:35', '14:55',
       '15:05', '15:10', '15:15', '15:20', '15:25', '15:45', '15:55',
       '16:10', '16:15', '16:20', '16:25', '16:35', '16:45', '16:50',
       '16:55', '17:05', '17:30', '17:35', '17:45', '17:50', '18:00',
       '18:05', '18:10', '18:15', '18:20', '18:30', '18:35', '18:45',
       '19:00', '19:10', '19:20', '19:40', '19:50', '20:00', '20:15',
       '20:20', '20:35', '20:45', '20:55', '21:00', '21:05', '21:15',
       '21:20', '21:25', '21:30', '21:40', '21:45', '22:00', '22:10',
       '22:15', '22:25', '22:40', '22:45', '22:50', '22:55'], dtype='<U5')

I would like to automatically replace all values not ending in '00' by an empty string, so I would get:

array(['00:00', '', '', '', '', '', '',
       '', '', '', '', '01:00', '', '',
       '', '', '', '', '', '02:00', '',
       ...
       '', '', '', '', '', ''], dtype='<U5')

Ideally using something which is part of the numpy library.

Note that the question asked for an array. But as stated in your answer it is clear that a list might be a better answer. — M.E., Jan 10 '22 at 10:31

j1-lee · Answer 1 · 2022-01-10T03:05:36.413

5

You can use list comprehension with endswith:

output = [s if s.endswith('00') else '' for s in lst]
print(output)
# ['00:00', '', '', '', '', '', '', '', '', '', '', '01:00', '', '', '', '', '
# ', '', '', '', '', '', '', '', '', '', '', '', '', '', '04:00', '', '', '',
# '', '05:00', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
#  '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
#  '', '', '', '', '11:00', '', '', '', '', '', '', '', '12:00', '', '', '', '
# ', '', '13:00', '', '', '', '14:00', '', '', '', '', '', '', '', '', '', '',
#  '', '', '', '', '', '', '', '', '', '', '', '', '', '', '18:00', '', '', ''
# , '', '', '', '', '19:00', '', '', '', '', '20:00', '', '', '', '', '', '21:
# 00', '', '', '', '', '', '', '', '22:00', '', '', '', '', '', '', '']

edited Jan 10 '22 at 03:05

answered Jan 10 '22 at 02:59

j1-lee

13,764
3
14
26

This is useful, answer would be @bb1 comment though, as the question was refered to numpy. – M.E. Jan 10 '22 at 03:12
2

@M.E. Yes, I am aware of that. But I was reluctant to use numpy array for non-numeric items. For example, with a long list with length 15,300,000 (100000 * your list), numpy approach took 15 seconds whereas list comprehension approach took 2 seconds on my machine. I have a hunch that there is almost nothing to gain when we use numpy for strings or general objects (although I am not really sure about this statement). – j1-lee Jan 10 '22 at 03:33
This might be relevant: https://stackoverflow.com/questions/49112552/vectorized-string-operations-in-numpy-why-are-they-rather-slow – j1-lee Jan 10 '22 at 03:47
Starting with a list is 5x faster than starting with and returning an array. – hpaulj Jan 10 '22 at 04:13
1

@j1-lee thanks that is a valuable comparison. I am using numpy arrays for matplotlib and I wanted to keep the same data types for everything (both numeric and non numeric). Not sure if there are use cases where numpy vs lists for strings might be advisable, I suspect there might be as there are many scenarios you can face. Both answers are relevant to better understand numpy. – M.E. Jan 10 '22 at 10:28

score 1 · Answer 2 · answered Jan 10 '22 at 03:41

traditional numpy way.

import numpy as np

narr = np.array(['00:00', '00:05', '00:15', '00:20', '00:25', '00:30', '00:35',
       '00:40', '00:45', '00:50', '00:55', '01:00', '01:05', '01:10',
       '01:15', '01:20', '01:40', '01:45', '01:55', '02:05', '02:10',
       '02:15', '02:35', '02:40', '02:45', '02:55', '03:05', '03:10',
       '03:30', '03:55', '04:00', '04:05', '04:25', '04:40', '04:55',
       '05:00', '05:05', '05:15', '05:20', '05:25', '05:30', '05:35',
       '05:50', '05:55', '06:05', '06:20', '06:25', '06:30', '06:35',
       '06:45', '06:50', '07:05', '07:15', '07:30', '07:40', '07:45',
       '07:50', '07:55', '08:10', '08:20', '08:25', '08:40', '08:45',
       '08:50', '09:15', '09:20', '09:45', '09:50', '09:55', '10:10',
       '10:15', '10:25', '10:30', '10:45', '10:50', '11:00', '11:05',
       '11:15', '11:25', '11:35', '11:45', '11:50', '11:55', '12:00',
       '12:10', '12:15', '12:25', '12:50', '12:55', '13:00', '13:40',
       '13:45', '13:50', '14:00', '14:10', '14:20', '14:35', '14:55',
       '15:05', '15:10', '15:15', '15:20', '15:25', '15:45', '15:55',
       '16:10', '16:15', '16:20', '16:25', '16:35', '16:45', '16:50',
       '16:55', '17:05', '17:30', '17:35', '17:45', '17:50', '18:00',
       '18:05', '18:10', '18:15', '18:20', '18:30', '18:35', '18:45',
       '19:00', '19:10', '19:20', '19:40', '19:50', '20:00', '20:15',
       '20:20', '20:35', '20:45', '20:55', '21:00', '21:05', '21:15',
       '21:20', '21:25', '21:30', '21:40', '21:45', '22:00', '22:10',
       '22:15', '22:25', '22:40', '22:45', '22:50', '22:55'], dtype='<U5')


with np.nditer(narr, flags=['multi_index'], op_flags=['writeonly']) as it:
    for x in it:
        if(int(str(x)[-2:]) > 0):
            x[...] = ''

print(narr)

A better control of the array iteration. Though most of the developers don't like the traditional ways. — Pavan Chandaka, Jan 10 '22 at 04:02
It's much slower than the list comprehension answer (even for an array). And for some obscure reason giving me some overwrite errors when trying multiple `timeit` loops. Here you are simply iterating through the array, so I don't see the need for better control. — hpaulj, Jan 10 '22 at 04:58

kingkong · Answer 3 · 2022-01-10T03:20:57.743

You can try regex like:

import numpy as np
import re
x = np.array(['00:00', '00:05', '00:15', '00:20', '00:25', '00:30', '00:35',
       '00:40', '00:45', '00:50', '00:55', '01:00', '01:05', '01:10',
       '01:15', '01:20', '01:40', '01:45', '01:55', '02:05', '02:10',
       '02:15', '02:35', '02:40', '02:45', '02:55', '03:05', '03:10',
       '03:30', '03:55', '04:00', '04:05', '04:25', '04:40', '04:55',
       '05:00', '05:05', '05:15', '05:20', '05:25', '05:30', '05:35',
       '05:50', '05:55', '06:05', '06:20', '06:25', '06:30', '06:35',
       '06:45', '06:50', '07:05', '07:15', '07:30', '07:40', '07:45',
       '07:50', '07:55', '08:10', '08:20', '08:25', '08:40', '08:45',
       '08:50', '09:15', '09:20', '09:45', '09:50', '09:55', '10:10',
       '10:15', '10:25', '10:30', '10:45', '10:50', '11:00', '11:05',
       '11:15', '11:25', '11:35', '11:45', '11:50', '11:55', '12:00',
       '12:10', '12:15', '12:25', '12:50', '12:55', '13:00', '13:40',
       '13:45', '13:50', '14:00', '14:10', '14:20', '14:35', '14:55',
       '15:05', '15:10', '15:15', '15:20', '15:25', '15:45', '15:55',
       '16:10', '16:15', '16:20', '16:25', '16:35', '16:45', '16:50',
       '16:55', '17:05', '17:30', '17:35', '17:45', '17:50', '18:00',
       '18:05', '18:10', '18:15', '18:20', '18:30', '18:35', '18:45',
       '19:00', '19:10', '19:20', '19:40', '19:50', '20:00', '20:15',
       '20:20', '20:35', '20:45', '20:55', '21:00', '21:05', '21:15',
       '21:20', '21:25', '21:30', '21:40', '21:45', '22:00', '22:10',
       '22:15', '22:25', '22:40', '22:45', '22:50', '22:55'])

print(np.array(list(map(lambda v: re.sub(r'[0-9]{2}:(([1-9][0-9])|(0[1-9]))', '',v) ,x))))

Reference

How do I replace all values not ending in 00 with an empty string in a numpy string array?

3 Answers3