4

I have a dataframe:

    Name    Section
1   James   P3
2   Sam     2.5C
3   Billy   T35
4   Sarah   A85
5   Felix   5I

How do I split numeric values into a separate column called Section_Number and also split alphabetic values to Section_Letter. Desired results

    Name    Section Section_Number  Section_Letter
1   James   P3               3          P
2   Sam     2.5C           2.5          C
3   Billy   T35             35          T
4   Sarah   A85             85          A
5   Felix   5L               5          L
jpp
  • 159,742
  • 34
  • 281
  • 339
David 54321
  • 568
  • 1
  • 9
  • 23
  • take a look at https://stackoverflow.com/questions/430079/how-to-split-strings-into-text-and-number – Yuca Jul 11 '18 at 14:25

4 Answers4

7

Use str.replace with str.extract by [A-Z]+ for all uppercase strings:

df['Section_Number'] = df['Section'].str.replace('([A-Z]+)', '')
df['Section_Letter'] = df['Section'].str.extract('([A-Z]+)')
print (df)
    Name Section Section_Number Section_Letter
1  James      P3              3              P
2    Sam    2.5C            2.5              C
3  Billy     T35             35              T
4  Sarah     A85             85              A
5  Felix      5I              5              I

For seelct also lowercase values:

df['Section_Number'] = df['Section'].str.replace('([A-Za-z]+)', '')
df['Section_Letter'] = df['Section'].str.extract('([A-Za-z]+)')
print (df)
    Name Section Section_Number Section_Letter
1  James      P3              3              P
2    Sam    2.5C            2.5              C
3  Billy     T35             35              T
4  Sarah     A85             85              A
5  Felix      5I              5              I
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

It'll no doubt be slower but throwing out an alternative for completeness you can use str.extractall to get named groups matching the patterns and consolidate the matches and join back to your DF...

new = df.join(
    df.Section.str.extractall(r'(?i)(?P<Section_Letter>[A-Z]+)|(?P<Section_Number>[\d.]+)')
    .groupby(level=0).first()
)

Result:

    Name Section Section_Letter Section_Number
1  James      P3              P              3
2    Sam    2.5C              C            2.5
3  Billy     T35              T             35
4  Sarah     A85              A             85
5  Felix      5I              I              5
Jon Clements
  • 138,671
  • 33
  • 247
  • 280
1

If, as in your example, you have one letter in each name, you can sort and then slice:

def get_vals(x):
    return ''.join(sorted(x, key=str.isalpha))

# apply ordering
vals = df['Section'].apply(get_vals)

# split numbers from letter
df['num'] = vals.str[:-1].astype(float)
df['letter'] = vals.str[-1]

print(df)

    Name Section   num letter
1  James      P3   3.0      P
2    Sam    2.5C   2.5      C
3  Billy     T35  35.0      T
4  Sarah     A85  85.0      A
5  Felix      5I   5.0      I
jpp
  • 159,742
  • 34
  • 281
  • 339
0

We can use itertools.groupby to group the contiguous alpha and non-alpha

from itertools import groupby

[sorted([''.join(x) for _, x in groupby(s, key=str.isalpha)]) for s in df.Section]

[['3', 'P'], ['2.5', 'C'], ['35', 'T'], ['85', 'A'], ['5', 'I']]

We can manipulate this into new columns

from itertools import groupby

N, L = zip(
    *[sorted([''.join(x) for _, x in groupby(s, key=str.isalpha)]) for s in df.Section]
)
df.assign(Selection_Number=N, Selection_Letter=L)

    Name Section Selection_Number Selection_Letter
1  James      P3                3                P
2    Sam    2.5C              2.5                C
3  Billy     T35               35                T
4  Sarah     A85               85                A
5  Felix      5I                5                I
piRSquared
  • 285,575
  • 57
  • 475
  • 624