0

The type of data we are streaming in is taken from our PI System which is outputting data in an irregular manner. This is not uncommon with time series data, so I have attempted to add 1 second or so to each time stamp to ensure the index is unique. However this has not worked as I hoped as I keep received a type error.

I have attempted to implement the solutions highlighted in (Modifying timestamps in pandas to make index unique) however without any success.

The error message I get is:

TypeError: ufunc add cannot use operands with types dtype('O') and dtype('<m8')

The code implementation is below:

values = Slugging_Sep.index.duplicated(keep=False).astype(float)
values[values==0] = np.NaN

missings = np.isnan(values)
cumsum = np.cumsum(~missings)
diff = np.diff(np.concatenate(([0.], cumsum[missings])))
values[missings] = -diff

# print result
result = Slugging_Sep.index + np.cumsum(values).astype(np.timedelta64)
print(result)

What I have tried

  • Type Casting - I thought that the calculation was due to two different types being added together but this hasn't resolved the issue.
  • Using Time Delta in Pandas - This creates the same Type Error.

    pd.to_timedelta(Slugging_Sep.groupby('Time').cumcount(), unit='ms'))
    Slugging_Sep['Time'] = (str(Slugging_Sep['Time'] + 
    pd.to_timedelta(Slugging_Sep.groupby('Time').cumcount(), unit='ms')))
    

So I have two questions from this:

  1. Could anyone provide some advice to me regarding how to solve this for future time series issues?
  2. What actually is dtype ('<m8')

Thank you.

Irregular Data Example

IronKirby
  • 708
  • 1
  • 7
  • 24
  • Can you provide example input and intended output for a few test cases? Also, please avoid posting screenshots of data - just post the actual data inline instead. It's a lot easier to help that way. – andrew_reece Dec 13 '17 at 05:17
  • 1
    My best guess is that Slugging_Sep.index is not a proper datetime. Have you tried converting it using pd.to_datetime() before adding the timedelta? – Alex Zisman Dec 13 '17 at 05:26

1 Answers1

0

Using Alex Zisman's suggestion, I reconverted the Slugging_Sep.index via the following line:

pd.to_datetime(Slugging_Sep['Time'])
Slugging_Sep.set_index('Time', inplace=True)

I then implemented the following code taken from the above SO link I mentioned:

#values = Slugging_Sep.index.duplicated(keep=False).astype(float)
#values[values==0] = np.NaN

#missings = np.isnan(values)
#cumsum = np.cumsum(~missings)
#diff = np.diff(np.concatenate(([0.], cumsum[missings])))
#values[missings] = -diff

# print result
#result = Slugging_Sep.index + np.cumsum(values).astype(np.timedelta64())
#Slugging_Sep.index = result
#print(Slugging_Sep.index)

This resolved the issue and added nanoseconds to each duplicate time stamp so it became a unique index.

IronKirby
  • 708
  • 1
  • 7
  • 24