1

I have a dataframe as below:

+--------+
|     Key|
+--------+
|  x10x60|
|x1x19x33|
|   x12x6|
|   a14x4|
|x1x1x1x6|
|x2a23x30|
+--------|

And I want the output like this: The Key column can be divide by each x element and put into xa/xb/xc/xd by order, but if there are a elements then place it into ta/tb/tc/td by order.

+--------+-----+------+-----+-----+-----+----+----+-----+
|     Key|   xa|    xb|   xc|   xd|   ta|  tb|  tc|   td|
+--------+-----+------+-----+-----+-----+----+----+-----+
|  x10x60|  x10|   x60|     |     |     |    |    |     |
|x1x19x33|   x1|   x19|  x33|     |     |    |    |     |
|   x12x6|  x12|    x6|     |     |     |    |    |     |
|   a14x4|     |    x4|     |     |  a14|    |    |     |
|x1x1x1x6|   x1|    x1|   x1|   x6|     |    |    |     |
|x2a23x30|   x2|      |  x30|     |     | a23|    |     |
+--------|-----+------+-----+-----+-----+----+----+-----+

I tried substr() or substring() cannot have the output, but seems stop at the divide.

tdy
  • 36,675
  • 19
  • 86
  • 83
jasondesu
  • 57
  • 6
  • Does this answer your question? [applying regex to a pandas dataframe](https://stackoverflow.com/questions/25292838/applying-regex-to-a-pandas-dataframe) – PM 77-1 Jan 25 '23 at 17:59
  • No, my question is not only like spilt elements, but also like place it to different columns in the correct columns – jasondesu Jan 25 '23 at 19:04

1 Answers1

0

Normally you can just use Series.str.split with expand=True, and pandas will auto-expand the substrings into columns.

But since you want to place the substrings into very specific columns, use Series.str.extractall:

import string

m = df['Key'].str.extractall(r'([xa][0-9]+)').reset_index()            # match the x* and a* substrings
m['match'] = m['match'].map(dict(enumerate(string.ascii_lowercase)))   # map 0,1,2... -> a,b,c...
m['match'] = np.where(m[0].str.startswith('x'), 'x', 't') + m['match'] # add x or t prefix

out = df[['Key']].join(
    m.pivot(index='level_0', columns='match', values=0)                # reshape into wide form
     .sort_index(key=lambda c: c.str.startswith('t'), axis=1)          # put t columns at end
     .fillna('')
)

Output:

>>> print(out)

        Key   xa   xb   xc  xd   ta   tb
0    x10x60  x10  x60                   
1  x1x19x33   x1  x19  x33              
2     x12x6  x12   x6                   
3     a14x4        x4           a14     
4  x1x1x1x6   x1   x1   x1  x6          
5  x2a23x30   x2       x30           a23
tdy
  • 36,675
  • 19
  • 86
  • 83