Split a column into 2 columns like alphabetic text in one column and alphanumeric or numbers or anything in 2nd column

Question

I have a dataframe column which contains product and technical details merged. I just want to split them separately into 2 columns like actual product name in one column and other technical details in one column.

I tried to solve the problem using regex and splitted the technical details separately, but the product name was going null wherever the technical details get splitted. not sure what went wrong.

This is the dataframe I tried
df = pd.DataFrame({'Description': ['WASHER tey DIN6340 10.5 C 35;', 'CABINET EL', 'CYLINDER SCREW', 'M12x N15']})

Code:
df['Technical Data'] = df['Description'].str.extract(r'^.*?(\s\w*\d+\w*\s.*)$')
df['Product Description'] = df['Description'].apply(lambda x: re.sub(r'^.*?(\w*\d+\w*\s.*)$', '', x))

The result I'm getting is

So I want the output to be like this

Any suggestions on how to do that??

Are all your words spelt with ASCII letters? In the `[a-zA-Z]` range, no accented chars? — Wiktor Stribiżew, Jul 13 '23 at 07:50
Try `df[['Technical Data','Product Description']] = df['Description'].str.extract(r'^(.*?)(?:\s*((?:[a-zA-Z]+[0-9]|[0-9]+[a-zA-Z]).*))?$', expand=True)` if the answer to the above is "yes". — Wiktor Stribiżew, Jul 13 '23 at 07:58
@WiktorStribiżew Thanks for your comments, it was really helpful to fix the issue — Pravin, Jul 13 '23 at 08:08

score 2 · Accepted Answer · answered Jul 13 '23 at 08:14

You may capture any zero or more chars as few as possible ("Technical Data" column) and then optional whitespaces followed with an alphanumeric string and then anything till the end of the string (the "Product Description" column):

df[['Technical Data','Product Description']] = df['Description'].str.extract(r'^(.*?)(?:\s*((?:[a-zA-Z]+[0-9]|[0-9]+[a-zA-Z]).*))?$', expand=True)

See the regex demo.

Details:

^ - start of string
(.*?) - Group 1: any zero or more chars, other than line break chars, as few as possible
(?:\s*((?:[a-zA-Z]+[0-9]|[0-9]+[a-zA-Z]).*))? - an optional group matching
- \s* - zero or more whitespaces
- ((?:[a-zA-Z]+[0-9]|[0-9]+[a-zA-Z]).*) - Group 2: either one or more letters and then a digit, or one or more digits and then a letter, and then any zero or more chars, other than line break chars, as few as possible
$ - end of string.

If you have Unicode letters, a common [^\W\d_] construct can help (you will need to replace [a-zA-Z] with [^\W\d_]).

Split a column into 2 columns like alphabetic text in one column and alphanumeric or numbers or anything in 2nd column

1 Answers1