-3

Want to separate each row value by dash and

After looking at Andy Hayden's answer

I’m grouping U, 33, and A, A

using (?P<Line>^\d{1,2}|^.|.*, .)

out of the rows U-2022W-ZZ5891 et. cetera

and trying to apply using df[0] = df[0].str.extract(r'(?P<Line>^\d{1,2}|^.|.*, .)') and am unsure of how to continue the grouping to get from starting to final.

col0
U-2022W-ZZ5891
U-2014X-7073
U-2010X-45
33-2010X-ZZ45
A, A-2010X-45
U-1996W-M-ZZ5891

from here ⬆️ (up arrow) to here ⬇️ (down arrow)

col0 col1 col2 col3
U 2022W ZZ5891
U 2014X 7073
U 2010X 45
33 2010X ZZ45
A, A 2010X 45
U 1996W M ZZ5891
kb9alpp
  • 107
  • 8

1 Answers1

2

You can use this regular expression:

new_df = df['col0'].str.extract('(.+?)-(.+?)-(?:(.+?)-)?(.+)').fillna('')

Output:

>>> new_df
      0      1  2       3
0     U  2022W     ZZ5891
1     U  2014X       7073
2     U  2010X         45
3    33  2010X       ZZ45
4  A, A  2010X         45
5     U  1996W  M  ZZ5891