0

i have a dataset that contains columns called date, shift, value, and so on.I want to extract last value for each date and shift from value column. For example for each day, there are two rows one contains datetime,shift(day or night) and last datapoints from value for each shift.

In this example, I want to extract 3 rd row(because the highest value for 7/14 and Day time is 3)

enter image description here

I only know how to get maximum value for each column. I tried in several ways to get this work done but it didn't work for me. I'm new to python and looking for your help.

  • Does this answer your question? [group by pandas dataframe and select latest in each group](https://stackoverflow.com/questions/41525911/group-by-pandas-dataframe-and-select-latest-in-each-group) – Vishesh Mangla Jul 27 '20 at 15:00
  • hmm, but first you need to convert it to complete seconds or just omit the time. – Vishesh Mangla Jul 27 '20 at 15:04

2 Answers2

0

assuming the data is already sorted by date you could do something like this? or sort by date then do this?

df['day'] = df['date'].apply(lambda x: x.date())
df.groupby(['day','shift'])['value'].agg(list).apply(lambda x: x[-1])

this will group the dataframe by date and shift, make a list of the values in each group and take the last value.

output:

day         shift
2020-07-14  day      3
            night    5
Name: value, dtype: int64

here's a way to do this but also grab multilple other columns as well... I admit it's not the cleanest and there's probably a better way but it works:

df:

                 date  shift value value2         day
0 2020-07-14 18:58:00    day     1      9  2020-07-14
1 2020-07-14 18:59:00    day     2      8  2020-07-14
2 2020-07-14 18:59:00    day     3      7  2020-07-14
3 2020-07-14 19:00:00  night     4      6  2020-07-14
4 2020-07-14 19:00:00  night     5      5  2020-07-14

cols = ['value', 'value2']
df.groupby(['day','shift'])[cols].agg(list).apply(lambda x: [x[col][-1] for col in cols], axis=1)

output:

day         shift
2020-07-14  day      [3, 7]
            night    [5, 5]
dtype: object
Derek Eden
  • 4,403
  • 3
  • 18
  • 31
  • Thanks a lot..that's what I looking for. But I need another column's values for this selected row as well. for example, there are another two columns for this data frame.How to I include those columns for this dataframe –  Jul 27 '20 at 15:16
  • can't we get other values as columns instead of putting them into an array?I mean putting them under "value1" and "value2" –  Jul 27 '20 at 15:38
  • 1
    lol I tried for a few minutes and couldnt get it to work.. I'm sure there's a way I just don't know off the top of my head, hopefully it gets you started though.. – Derek Eden Jul 27 '20 at 15:47
  • perhaps something here https://stackoverflow.com/questions/14529838/apply-multiple-functions-to-multiple-groupby-columns – Derek Eden Jul 27 '20 at 15:48
-1

If you need the max instead of the last

import pandas

data = {"date": ["day1","day1","day1","day1","day1"],
        "shift": ["Day","Day","Day","Night","Night"],
        "value": [1, 2, 3, 4, 5]
        }

df = pandas.DataFrame(data)
df.groupby(["date","shift"]).max()

Output

            value
date shift       
day1 Day        3
     Night      5

Check the pandas package, dataframe and groupby operation for more help: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

Grom
  • 50
  • 4