0

I am trying to justify an array containing pd.Timestamp objects and np.nans using the method suggested here: Python: Justifying NumPy array (idea: mask the NaNs with a boolean array, sort the boolean array (justified_mask) and assign the valid values to the new, sorted positions).

The only line that causes problems is when the Timestamp objects shall be inserted from the old masked array into the new masked array:

out[justified_mask] = arr[mask]

It throws the following error:

TypeError: float() argument must be a string or a number, not 'Timestamp'

without any Traceback coming from somewhere deeper in NumPy or so. Funnily, the masking operations on both sides of the equal sign work perfectly. Any idea how to solve this without major hassle? I know I could convert the Timestamp objects into integers by subtracting them from a date and the convert it back, but this is ot so easy as there are NaNs in the array as well. Is there any simpler solution?

EDIT: Reproducible example

times = np.array([[np.nan, pd.Timestamp('2018-11-07')], [np.nan, pd.Timestamp('2018-11-07')]])

Ok, sorry, I forgot to mention that I had to switch the invalid_val to accept a None and let pd.isnull() filter the NaNs out, making the justify function look like this:

def justify(a, invalid_val=0, axis=1, side='left'):
    if invalid_val is np.nan:
        mask = ~np.isnan(arr)
    elif (invalid_val is not np.nan) and (invalid_val is not None):
        mask = arr != invalid_val
    else:
        mask = ~pd.isnull(arr)
    justified_mask = np.sort(mask, axis=axis)
    if (side == 'up') | (side == 'left'):
        justified_mask = np.flip(justified_mask, axis=axis)
    out = np.full(arr.shape, np.nan)
    if axis == 1:
        out[justified_mask] = arr[mask]
    else:
        out.T[justified_mask.T] = arr.T[mask.T]

If you then run:

utils.justify(times, side='left', invalid_val=None)

you get the above error.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
user3017048
  • 2,711
  • 3
  • 22
  • 32
  • Could you provide a reproducible error? – Franco Piccolo Nov 07 '18 at 07:04
  • `out` is a float dtype array. python can't convert a `Timestamp` into a float. As a pandas user you may be used to its happy-go-lucky way of switching to object dtype when values don't fit. numpy doesn't switch dtype once an array is created. – hpaulj Nov 07 '18 at 07:52
  • Ahhh..cool! I switched the `dtype`of `out` to `object` and now it works! You can make this an answer if you want! – user3017048 Nov 07 '18 at 08:01

0 Answers0