2

I would like to be able to add a column which counts rows in order based on a value. For example, below are three different people with records that have a timestamp. I want to count the order of records based on the PersonID. This should restart for every PersonID. (I am able to do this in Tableau with Index() but I want it part of the raw file too)

> PersonID,             DateTime,             Order,     Total
    a226           2015-04-16 11:57:36          1          1
    a226           2015-04-17 15:32:14          2          1
    a226           2015-04-17 19:13:43          3          1
    z342           2015-04-15 07:02:20          1          1
    x391           2015-04-17 13:43:31          1          1
    x391           2015-04-17 05:12:16          2          1

If there is a way to subtract the DateTime as well? My way would be to only select Order 1 as a dataframe, then only select Order 2, then merge, then subtract. Is there a way to do it automatically?

DSM
  • 342,061
  • 65
  • 592
  • 494
trench
  • 81
  • 8
  • Sorry can you post desired output and explain how each of the results are arrived, thanks – EdChum Apr 18 '15 at 18:30
  • The 'Order' column is what I am trying to determine. This would be useful for me because I would be able to filter all 'First Interactions' and then all 'Second Interactions' onwards to see common traits. I want to be able to know what happened on the first interaction (Order 1) and then what happened on Order 2 so I can prevent it from happening. Being able to quickly determine the time difference between Order 1 and Order 2 (per PersonID) would be great too. – trench Apr 18 '15 at 18:55

3 Answers3

5

IIUC, you can do a groupby with cumcount:

>>> df["Order"] = df.groupby("PersonID").cumcount() + 1
>>> df
  PersonID             DateTime  Order
0     a226  2015-04-16 11:57:36      1
1     a226  2015-04-17 15:32:14      2
2     a226  2015-04-17 19:13:43      3
3     z342  2015-04-15 07:02:20      1
4     x391  2015-04-17 13:43:31      1
5     x391  2015-04-17 05:12:16      2

If you want to guarantee that it's in increasing time order, you should sort by DateTime first, but your example has x391 in non-increasing order, so I'm assuming you don't want that.


If you do want to involve the timestamps, I tend to sort first, to make life easier:

>>> df["DateTime"] = pd.to_datetime(df["DateTime"]) # just in case
>>> df = df.sort(["PersonID", "DateTime"])
>>> df["Order"] = df.groupby("PersonID").cumcount() + 1
>>> df
  PersonID            DateTime  Order
0     a226 2015-04-16 11:57:36      1
1     a226 2015-04-17 15:32:14      2
2     a226 2015-04-17 19:13:43      3
5     x391 2015-04-17 05:12:16      1
4     x391 2015-04-17 13:43:31      2
3     z342 2015-04-15 07:02:20      1

Even without sorting, though, you could call rank on the grouped column, which has more options to specify how you want to handle ties:

>>> df["Order"] = df.groupby("PersonID")["DateTime"].rank()
>>> df
  PersonID            DateTime  Order
0     a226 2015-04-16 11:57:36      1
1     a226 2015-04-17 15:32:14      2
2     a226 2015-04-17 19:13:43      3
5     x391 2015-04-17 05:12:16      1
4     x391 2015-04-17 13:43:31      2
3     z342 2015-04-15 07:02:20      1
DSM
  • 342,061
  • 65
  • 592
  • 494
1

UPDATE for Pandas > 0.20:

sort() was removed from Pandas with release 0.20 (2017-05-05). Now there is sort_values() and sort_index().

The currently functioning code is:

df["DateTime"] = pd.to_datetime(df["DateTime"]) # just in case
df = df.sort_by(["PersonID", "DateTime"])
# Don't forget to add [] if you are grouping by more than one column!
df["Order"] = df.groupby("PersonID").cumcount() + 1

Answer used as reference: 'DataFrame' object has no attribute 'sort'

Community
  • 1
  • 1
Tiago Duque
  • 1,956
  • 1
  • 12
  • 31
0

you want to use the groupby function and sum. so you could try something like: (assuming your dataframe is called df) grouped = df.groupby("PersonID") sum of some column would be: grouped[column].sum() if you just want unique values you could do df["PersonID"].unique()

Skorpeo
  • 2,362
  • 2
  • 15
  • 20