0

I have a dataframe with MultiIndex columns. I want to filter the data using the columns of the dataset. When I try df.columns I get this information:

MultiIndex(levels=[['power'], ['active']],
           codes=[[0], [0]],
           names=['physical_quantity', 'type'])

A short description of the dataset is:

physical_quantity          power
type                      active
2011-04-18 09:22:13-04:00    6.0
2011-04-18 09:22:16-04:00    6.0
2011-04-18 09:22:20-04:00    6.0
2011-04-18 09:22:23-04:00    6.0
2011-04-18 09:22:26-04:00    6.0

The first thing I found is that although you see two columns there, the dataframe really says that it is a dataset of [529757 rows x 1 columns].

What I want to do is to filter the data selecting an interval of time, choosing the first column called physical_quantity type.

On the other hand, the data of that first column (physical_quantity type) it is unknown:

physical_quantity  type  
power              active    float32
dtype: object

Checking with df.index I managed to see this information about the dataframe:

DatetimeIndex(['2011-04-18 09:22:13-04:00', '2011-04-18 09:22:16-04:00',
               '2011-04-18 09:22:20-04:00', '2011-04-18 09:22:23-04:00',
               '2011-04-18 09:22:26-04:00', '2011-04-18 09:22:30-04:00',
               '2011-04-18 09:22:33-04:00', '2011-04-18 09:22:37-04:00',
               '2011-04-18 09:22:40-04:00', '2011-04-18 09:22:44-04:00',
               ...
               '2011-05-14 23:59:26-04:00', '2011-05-14 23:59:29-04:00',
               '2011-05-14 23:59:33-04:00', '2011-05-14 23:59:36-04:00',
               '2011-05-14 23:59:40-04:00', '2011-05-14 23:59:43-04:00',
               '2011-05-14 23:59:46-04:00', '2011-05-14 23:59:50-04:00',
               '2011-05-14 23:59:53-04:00', '2011-05-14 23:59:57-04:00'],
              dtype='datetime64[ns, US/Eastern]', length=529757, freq=None)

So I understood that the data of that column is something like dtype='datetime64[ns, US/Eastern]

So I aim to slice the data, from an specific day and hour to another day and hour.

from 2011-05-10 19:44:51-04:00 to 2011-05-10 23:17:59-04:00

I tried doing something like this:

df[df['physical_quantity', 'type']] > 2011-05-10 19:44:51-04:00 
& 
df[df['physical_quantity', 'type']] < 2011-05-10 23:17:59-04:00
df[df['physical_quantity', 'type']] > 2011-05-10 19:44:51-04:00

File "<ipython-input-133-27848c7d6afc>", line 1
    df[df['physical_quantity', 'type']] > 2011-05-10 19:44:51-04:00
                                                ^
SyntaxError: invalid token

How can I solve my problem?

nekovolta
  • 496
  • 2
  • 14
  • By itself, `2011-05-10 19:44:51-04:00` is not a valid object for Python, and the interpreter does not understand how to deal with this. Is it an int, a float, a string? You should be comparing compatible datatypes, in this case two `np.datetime64` or `datetime.datetime` objects. Take a look at [these](https://stackoverflow.com/questions/30689445/datetime64-comparison-in-dataframes) [two](https://stackoverflow.com/questions/53726095/comparing-dataframe-datetime-column-value-in-python) cases, in case they help. – hyperTrashPanda Jul 03 '19 at 14:28

1 Answers1

1

try this

df['ts'] = pd.to_datetime(df["ts"], unit="ms")```
Mathanraj-Sharma
  • 354
  • 1
  • 3
  • 7
  • 1
    Although this code snippet may answer the question, including an explanation of *why* and *how* it helps solve the problem improves the quality and longevity of your answer. See ["How do I write a good answer?"](https://stackoverflow.com/help/how-to-answer). – slothiful Jul 04 '19 at 01:54