2

Please see my code below. I'm iterating through strings like '1A', '4D', etc, and I want the output to instead be 1.1, 4.4, and so on..see below.

Instead of 1A I want 1.1, 1B= 1.2, 4A = 4.1, 5D = 5.4, etc...

Convert alphabet letters to number in Python

data = ['1A','1B','4A', '5D','']
df = pd.DataFrame(data, columns = ['Score'])

newcol = []

for col, row in df['Score'].iteritems()
    if pd.isnull(row):
        newcol.append(row)       
    elif pd.notnull(row): 
        newcol.append(#FIRST ELEMENT OF ROW, 1-5,'.', 
                      #NUMERIC EQUIVALENT OF ALPHA, IE, A=1, B=2, C=3, D=4, etc)
Ch3steR
  • 20,090
  • 4
  • 28
  • 58
SPena
  • 39
  • 4

3 Answers3

2

Use (with @Ch3steR's comment)-

from string import ascii_uppercase
dic = {j:str(i) for i,j in enumerate(ascii_uppercase, 1)}
df['Score'].str[:-1] + '.' + df['Score'].str[-1].map(dic)

Output

0    1.1
1    1.2
2    4.1
3    5.4
4    NaN
Name: Score, dtype: object
Vivek Kalyanarangan
  • 8,951
  • 1
  • 23
  • 42
2

You can use str.replace:

df['Score'] = df['Score'].str.replace('\D',
              lambda x: f'.{ord(x.group(0).upper())-64}', regex=True)

output:

  Score
0   1.1
1   1.2
2   4.1
3   5.4
4      
mozway
  • 194,879
  • 13
  • 39
  • 75
1

You could build mapping using str.maketrans and str.translate, a common recipe for mapping each character to it's output.

  • str.maketrans

    This static method returns a translation table usable for str.translate().

  • str.translate

    Return a copy of the s where all characters have been mapped through the map which must be a dictionary of Unicode ordinals (integers) to Unicode ordinals, strings or None. Unmapped characters are left untouched.

Use pd.Series.apply and pass str.translate to it.

from string import ascii_uppercase

table = str.maketrans({c: f'.{i}' for i, c in enumerate(ascii_uppercase, 1)})
df['Score'].apply(str.translate, args=(table, ))

# 0    1.1
# 1    1.2
# 2    4.1
# 3    5.4
# 4       
# Name: Score, dtype: object

Timeit results:

Benchmarking setup

def ch3ster(df):
    table = str.maketrans(
        {c: f".{i}" for i, c in enumerate(ascii_uppercase, 1)}
    )
    return df["Score"].apply(str.translate, args=(table,))


def Vivek(df):
    dic = {j: str(i) for i, j in enumerate(ascii_uppercase, 1)}
    return df["Score"].str[:-1] + "." + df["Score"].str[-1].map(dic)


def mozway(df):
    return df["Score"].str.replace(
        "\D", lambda x: f".{ord(x.group(0).upper())-64}", regex=True
    )


def check(a, b):
    return (a == b).all()


bench = perfplot.bench(
    setup=lambda n: pd.DataFrame(
        {
            "Score": np.arange(n).astype(str)
            + pd.Series([random.choice(ascii_uppercase) for _ in range(n)])
        }
    ),
    kernels=[ch3ster, Vivek, mozway],
    n_range=[10 ** i for i in range(1, 8)],
    xlabel="size of df",
    equality_check=check,
)

Results

n ch3ster Vivek mozway
10 0.000138986 0.000730289 0.000135238
100 0.00018052 0.000789941 0.00021811
1000 0.000569407 0.00126675 0.000882363
10000 0.00471242 0.00610832 0.00777755
100000 0.0578925 0.076809 0.0871657
1000000 0.604576 0.738928 0.867847
10000000 6.21429 7.11069 8.69433

bench graph When df is large:

  • If execution time matters you could use maketrans + translate solution.
  • Ordering by execution time(lowest time taken to longest time taken) ch3ster < Vivek < Mozway

When df is small (size less than 10K):

  • Both mozway's solution and maketrans almost take a similar time. maketrans being slightly faster.
  • Ordering by execution time(lowest time taken to longest time taken) ch3ster < Mozway < Vivek
Ch3steR
  • 20,090
  • 4
  • 28
  • 58