1

I have dataframe like

Date          Time           nPoints      hour

2011-08-01    00:02:21          3           0

2011-08-01    00:04:21          8           0

2011-08-01    00:05:50          2           0

2011-08-01    01:02:21          4           1

2011-08-01    01:03:00          5           1

...

so i want the code that will pick the only the first row of each hour if the datapoints recorded in an hour are more than one. The output would look like

Date          Time           nPoints      hour

2011-08-01    00:02:21          3           0

2011-08-01    01:02:21          4           1

...

I have seen a similar question here: How can i get first value of each hours? ORACLE but it's not a python code

Below is code that i tried, it only returns the list of hours not the whole row as i require

def appendIfNewNumber(unqNumbers, number):

if len(unqNumbers) == 0 or number != unqNumbers[-1]:

    unqNumbers.append(number)

unqNumbers = []

for number in df2['hour']:

    appendIfNewNumber(unqNumbers, number)

    print(unqNumbers)
Mbali Aleh
  • 41
  • 6
  • 1
    Possible duplicate of [Pandas dataframe get first row of each group](https://stackoverflow.com/questions/20067636/pandas-dataframe-get-first-row-of-each-group) – peer Oct 21 '19 at 13:28
  • I have dataframe for the whole month, with each day having 24 hours, so the routine you refered me to only picks the first 24 hours of the first day of the month and leaves the rest @peer – Mbali Aleh Oct 21 '19 at 14:30
  • 1
    you just need to combine `Date` and `Time` into `yyyy-mm-dd_hh` then you can group by that. – peer Oct 21 '19 at 15:11

3 Answers3

1
import time
import datetime
def reset_df(df):
    df['hr']=df['Time'].apply(lambda x:datetime.datetime.strptime(x,'%H:%M:%S').time().hour)
    df=df.groupby('hr').first().reset_index()
    df=df.drop('hr',axis=1)
    return df
df=reset_df(df)
Rahul Verma
  • 2,988
  • 2
  • 11
  • 26
0

You can create a column Hour where you extract the hour from your column Time and then you apply drop duplicate on your data frame it will only keep the first occurrence.

import time
import datetime

def select_hours(df):
    df['hr']=df['Time'].apply(lambda x:datetime.datetime.strptime(x,'%H:%M:%S').time().hour)
    df.drop_duplicates(subset=['Date', 'hr'], keep=False)
    df=df.drop('hr',axis=1)
    return df

It should do the trick and keep only the frist datapoint of every hour and date.

0

There's a small edit that can be done to both the above answers, instead of using datetime.datetime.strptime(), we can use pd.to_datetime(). See below:

import pandas as pd

def select_unq_hours(df):
  df['hr'] = pd.to_datetime(df['Time']).dt.hour
  df.drop_duplicates(subset=['Date', 'hr'], keep='first', inplace=True)
  df.drop('hr', axis=1, inplace=True)
  return df
Sampath
  • 1
  • 1