1

I have 2 arrays, and a pandas dataframe. What I am trying to accomplish is: Use the one array with datetime64 values, a 3 flattened 3D array (same length as the time one), and a pandas dataframe (with over 6000 rows)

What I need to do, is: Use the timestamps on the time array, search how many of them match in the dataframe, and create two new arrays (new time and new DF) with those values Additionally, the flattened array has an equal amount of values as the time array, so I would like to extract those to a new array (new_flat)

Some snippets of code:

mini_time = ['2015-03-25T13:05:00.000000Z',
'2015-03-25T13:05:03.000000Z',
 '2015-03-25T13:05:06.000000Z',
 '2015-03-25T13:05:09.000000Z',
 '2015-03-25T13:05:12.000000Z']

mini_flat=np.zeros((5,5,3750))

np.random.seed(5)
df = pd.DataFrame(np.random.randint(100, size=(5, 40)),
                  columns=list('ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMN'),
                  index=['2015-03-25T13:05:00.000000{}'.format(i) for i in range(5)])

I understand here the index is a string, but in my original dtaframe, there is a column named 'Date' containing a series of 6000+ timestamps that I made into indexes

Here is what I have so far:

df=df.set_index('Date')

This allows me to set the timestamps on DF as index

new_Time = []
new_Flat = []
new_DF = []
for t in range(len(time)):
    s = df.loc[df.index.unique()[df.index.unique().get_loc(np.datetime64(time[mini_time]), method='nearest')]]
    if (s.index - np.datetime64(time[mini_time])) < 0.2: #check this by hand 
        new_Time.append(time[mini_time])
        new_DF.append(s)
        new_Flat.mini_flat[mini_time]
        
  UFuncTypeError: ufunc 'subtract' cannot use operands with types dtype('O') and dtype('<M8[us]')

If I change s.index to s.name,

TypeError: Cannot compare type Timedelta with type float

Am I getting the right approach at least?

  • Please always include minimal example data. [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – wwii Jul 14 '21 at 21:25
  • I *think* I added a smaller scale of what I'm working with – Steve Bermeo Jul 14 '21 at 23:36
  • Your example code produces `KeyError: "None of ['Date'] are in the columns"` for the line `df=df.set_index('Date')`. Please read and provide a [mre]. - If lists or Series are supposed to be objects other than strings, please ensure the example data reflects that. please read [ask]. – wwii Jul 15 '21 at 14:04

1 Answers1

0

Fixed after several failed attempts The error was the comparison the IF operation did after subtracting two times. Using s.index gave the column names of the df, whereas s.name gave the timestamp with s.name - time[t], it produces a TimeDelta, which can't be checked against a float, but the method .total_seconds (because the timestamps are all 3 seconds apart, so checking against a number small enough) did the trick

The only change has to be done in the if loop:

if ((s.name - np.datetime64(time[t])).total_seconds()) < 0.2: