2

I have a pandas dataframe with dates on each row, I would like to groupby a second column and sort the dates then assign letter A to the first date and B to the second etc. See example below where I am stuck.

import pandas as pd

rng = pd.date_range('2015-02-24', periods=5, freq='T')
names = ['MAK', 'MAK', 'OKA', 'OKA', 'MAK']

df = pd.DataFrame({ 'Date': rng, 'Groups': names })

df

                 Date Groups
0 2015-02-24 00:00:00    MAK
1 2015-02-24 00:01:00    MAK
2 2015-02-24 00:02:00    OKA
3 2015-02-24 00:03:00    OKA
4 2015-02-24 00:04:00    MAK

The result I would like is:

                 Date Groups  Letter
0 2015-02-24 00:00:00    MAK    A
1 2015-02-24 00:01:00    MAK    B
2 2015-02-24 00:02:00    OKA    A
3 2015-02-24 00:03:00    OKA    B
4 2015-02-24 00:04:00    MAK    C

Thinking I could define a function as such:

def assignLetters(row):
    return sort == 'A'

 df['Letter'] = df.groupby('Groups').Date.apply(assignLetters)

Any help would be appreciated! Maybe direct me in the right direction, not sure how to assign letters chronologically?

zape
  • 105
  • 1
  • 6
  • Hi, is it possible to make the list longer for d? If there are more than 26 dates. It starts enumerating AA, then AB, AC etc? – zape Jan 28 '21 at 10:32

1 Answers1

3

Use DataFrameGroupBy.rank with mapping by dictionary created from uppercase alphabet:

import string

d = dict(enumerate(string.ascii_uppercase, 1))

df['Letter'] = df.groupby('Groups')['Date'].rank('dense').map(d)
print (df)
                 Date Groups Letter
0 2015-02-24 00:00:00    MAK      A
1 2015-02-24 00:01:00    MAK      B
2 2015-02-24 00:02:00    OKA      A
3 2015-02-24 00:03:00    OKA      B
4 2015-02-24 00:04:00    MAK      C

EDIT: For many groups is possible generate AA,AB.. after Z values by custom function:

from string import ascii_uppercase
import itertools

#https://stackoverflow.com/a/29351603/2901002
def iter_all_strings():
    for size in itertools.count(1):
        for s in itertools.product(ascii_uppercase, repeat=size):
            yield "".join(s)

df['Letter'] = df.groupby('Groups')['Date'].rank('dense')

d = dict(enumerate(itertools.islice(iter_all_strings(), int(df['Letter'].max())), 1))
print (d)
{1: 'A', 2: 'B', 3: 'C'}

df['Letter'] = df['Letter'].map(d)
print (df)
                 Date Groups Letter
0 2015-02-24 00:00:00    MAK      A
1 2015-02-24 00:01:00    MAK      B
2 2015-02-24 00:02:00    OKA      A
3 2015-02-24 00:03:00    OKA      B
4 2015-02-24 00:04:00    MAK      C
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252