-1

For example a dataframe as the Following

| Customer Name | Customer Group | | --- | --- | | ABC, PT | NaN | | DEF, PT | NaN | ....

the Customer Group field is filled with text before the comma hence the expected output is like this

Customer Name | Customer Group | --- | --- | | ABC, PT | ABC | DEF, PT | DEF ...

Does anyone know how to code this? I am guessing it will use regex

1 Answers1

0

EDIT: There are columns names with same values, here Customer Name, so first use some solution from this.

Then use Series.str.extract for all values before ,:

df['Customer Group'] = df['Customer Name'].str.extract('^(.*),')

Or:

df['Customer Group'] = df['Customer Name'].str.split(',').str[0]

If need replace only missing values:

df['Customer Group']=df['Customer Group'].fillna(df['Customer Name'].str.extract('^(.*),'))
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Hi I tried to apply the code but it showed error like this "value" parameter must be a scalar, dict or Series, but you passed a "DataFrame" – Carlos Alberto Apr 25 '22 at 06:22
  • @CarlosAlberto - It means `print (df['Customer Name'])` return `DataFrame` instead one column? – jezrael Apr 25 '22 at 06:24