Mapping alphabets to its numeric equivalent?

Question

Please see my code below. I'm iterating through strings like '1A', '4D', etc, and I want the output to instead be 1.1, 4.4, and so on..see below.

Instead of 1A I want 1.1, 1B= 1.2, 4A = 4.1, 5D = 5.4, etc...

Convert alphabet letters to number in Python

data = ['1A','1B','4A', '5D','']
df = pd.DataFrame(data, columns = ['Score'])

newcol = []

for col, row in df['Score'].iteritems()
    if pd.isnull(row):
        newcol.append(row)       
    elif pd.notnull(row): 
        newcol.append(#FIRST ELEMENT OF ROW, 1-5,'.', 
                      #NUMERIC EQUIVALENT OF ALPHA, IE, A=1, B=2, C=3, D=4, etc)

Vivek Kalyanarangan · Answer 1 · 2022-01-19T18:14:02.513

2

Use (with @Ch3steR's comment)-

from string import ascii_uppercase
dic = {j:str(i) for i,j in enumerate(ascii_uppercase, 1)}
df['Score'].str[:-1] + '.' + df['Score'].str[-1].map(dic)

Output

0    1.1
1    1.2
2    4.1
3    5.4
4    NaN
Name: Score, dtype: object

edited Jan 19 '22 at 18:14

answered Jan 19 '22 at 16:35

Vivek Kalyanarangan

8,951
1
23
42

1

You could use `enumerate(iterable, 1)` instead for performing addition in each iteration. – Ch3steR Jan 19 '22 at 16:53
1

This solution doesn't work when a string is let's say "11B". – Ch3steR Jan 19 '22 at 17:16
@Ch3steR yes, will have to modify to [:-1] and [-1] indexers..done – Vivek Kalyanarangan Jan 19 '22 at 18:13
Updated my answer with your answer. Check out the updated timings and benchmarking stats. – Ch3steR Jan 21 '22 at 08:10

score 2 · Answer 2 · answered Jan 19 '22 at 16:38

2

You can use str.replace:

df['Score'] = df['Score'].str.replace('\D',
              lambda x: f'.{ord(x.group(0).upper())-64}', regex=True)

output:

  Score
0   1.1
1   1.2
2   4.1
3   5.4
4

answered Jan 19 '22 at 16:38

mozway

194,879
13
39
75

Ch3steR · Answer 3 · 2022-01-21T08:10:14.933

You could build mapping using str.maketrans and str.translate, a common recipe for mapping each character to it's output.

str.maketrans

This static method returns a translation table usable for str.translate().
str.translate

Return a copy of the s where all characters have been mapped through the map which must be a dictionary of Unicode ordinals (integers) to Unicode ordinals, strings or None. Unmapped characters are left untouched.

Use pd.Series.apply and pass str.translate to it.

from string import ascii_uppercase

table = str.maketrans({c: f'.{i}' for i, c in enumerate(ascii_uppercase, 1)})
df['Score'].apply(str.translate, args=(table, ))

# 0    1.1
# 1    1.2
# 2    4.1
# 3    5.4
# 4       
# Name: Score, dtype: object

Timeit results:

Benchmarking setup

def ch3ster(df):
    table = str.maketrans(
        {c: f".{i}" for i, c in enumerate(ascii_uppercase, 1)}
    )
    return df["Score"].apply(str.translate, args=(table,))


def Vivek(df):
    dic = {j: str(i) for i, j in enumerate(ascii_uppercase, 1)}
    return df["Score"].str[:-1] + "." + df["Score"].str[-1].map(dic)


def mozway(df):
    return df["Score"].str.replace(
        "\D", lambda x: f".{ord(x.group(0).upper())-64}", regex=True
    )


def check(a, b):
    return (a == b).all()


bench = perfplot.bench(
    setup=lambda n: pd.DataFrame(
        {
            "Score": np.arange(n).astype(str)
            + pd.Series([random.choice(ascii_uppercase) for _ in range(n)])
        }
    ),
    kernels=[ch3ster, Vivek, mozway],
    n_range=[10 ** i for i in range(1, 8)],
    xlabel="size of df",
    equality_check=check,
)

Results

n	ch3ster	Vivek	mozway
10	0.000138986	0.000730289	0.000135238
100	0.00018052	0.000789941	0.00021811
1000	0.000569407	0.00126675	0.000882363
10000	0.00471242	0.00610832	0.00777755
100000	0.0578925	0.076809	0.0871657
1000000	0.604576	0.738928	0.867847
10000000	6.21429	7.11069	8.69433

When df is large:

If execution time matters you could use maketrans + translate solution.
Ordering by execution time(lowest time taken to longest time taken) ch3ster < Vivek < Mozway

When df is small (size less than 10K):

Both mozway's solution and maketrans almost take a similar time. maketrans being slightly faster.
Ordering by execution time(lowest time taken to longest time taken) ch3ster < Mozway < Vivek

Mapping alphabets to its numeric equivalent?

3 Answers3

Timeit results:

Benchmarking setup

Results