Pandas dataframe sort by date, then assign letters A-Z

Question

I have a pandas dataframe with dates on each row, I would like to groupby a second column and sort the dates then assign letter A to the first date and B to the second etc. See example below where I am stuck.

import pandas as pd

rng = pd.date_range('2015-02-24', periods=5, freq='T')
names = ['MAK', 'MAK', 'OKA', 'OKA', 'MAK']

df = pd.DataFrame({ 'Date': rng, 'Groups': names })

df

                 Date Groups
0 2015-02-24 00:00:00    MAK
1 2015-02-24 00:01:00    MAK
2 2015-02-24 00:02:00    OKA
3 2015-02-24 00:03:00    OKA
4 2015-02-24 00:04:00    MAK

The result I would like is:

                 Date Groups  Letter
0 2015-02-24 00:00:00    MAK    A
1 2015-02-24 00:01:00    MAK    B
2 2015-02-24 00:02:00    OKA    A
3 2015-02-24 00:03:00    OKA    B
4 2015-02-24 00:04:00    MAK    C

Thinking I could define a function as such:

def assignLetters(row):
    return sort == 'A'

 df['Letter'] = df.groupby('Groups').Date.apply(assignLetters)

Any help would be appreciated! Maybe direct me in the right direction, not sure how to assign letters chronologically?

Hi, is it possible to make the list longer for d? If there are more than 26 dates. It starts enumerating AA, then AB, AC etc? — zape, Jan 28 '21 at 10:32

jezrael · Accepted Answer · 2021-01-28T10:49:17.813

3

Use DataFrameGroupBy.rank with mapping by dictionary created from uppercase alphabet:

import string

d = dict(enumerate(string.ascii_uppercase, 1))

df['Letter'] = df.groupby('Groups')['Date'].rank('dense').map(d)
print (df)
                 Date Groups Letter
0 2015-02-24 00:00:00    MAK      A
1 2015-02-24 00:01:00    MAK      B
2 2015-02-24 00:02:00    OKA      A
3 2015-02-24 00:03:00    OKA      B
4 2015-02-24 00:04:00    MAK      C

EDIT: For many groups is possible generate AA,AB.. after Z values by custom function:

from string import ascii_uppercase
import itertools

#https://stackoverflow.com/a/29351603/2901002
def iter_all_strings():
    for size in itertools.count(1):
        for s in itertools.product(ascii_uppercase, repeat=size):
            yield "".join(s)

df['Letter'] = df.groupby('Groups')['Date'].rank('dense')

d = dict(enumerate(itertools.islice(iter_all_strings(), int(df['Letter'].max())), 1))
print (d)
{1: 'A', 2: 'B', 3: 'C'}

df['Letter'] = df['Letter'].map(d)
print (df)
                 Date Groups Letter
0 2015-02-24 00:00:00    MAK      A
1 2015-02-24 00:01:00    MAK      B
2 2015-02-24 00:02:00    OKA      A
3 2015-02-24 00:03:00    OKA      B
4 2015-02-24 00:04:00    MAK      C

edited Jan 28 '21 at 10:49

answered Jan 28 '21 at 10:09

jezrael

822,522
95
1,334
1,252

Hi, is it possible to make the list longer for d? If there are more than 26 dates. It starts enumerating AA, then AB, AC etc? – zape Jan 28 '21 at 10:32
1

@zape - Give me some time for solution. – jezrael Jan 28 '21 at 10:33
Sure, its kind of how excel numerates their columns. – zape Jan 28 '21 at 10:34
HI, i get this error: File "", line 1, in d = dict(enumerate(itertools.islice(iter_all_strings(), df['Letter'].max()))) ValueError: Stop argument for islice() must be None or an integer: 0 <= x <= sys.maxsize. – zape Jan 28 '21 at 10:45
@zape - Can you test now? – jezrael Jan 28 '21 at 10:46
Just getting NaN values – zape Jan 28 '21 at 10:47
@zape - sorry, my `))` mysmatch – jezrael Jan 28 '21 at 10:49

Pandas dataframe sort by date, then assign letters A-Z

1 Answers1