0

I have a dataframe column containing the following:

Audience
searchretargeting
data-capture-320x50
purchase-behavior-320x500
data-capture-728x90

I want to create a new column (Audience2) by splitting out the 'Audience' column based on the '-' delimiter, ideally would like to only keep the 1st element of the split ('data' not the 'capture-320x50')

If there is no '-' present I would like the new column to be populated with what was in 'Audience'(e.g. searchretargeting):

Audience               Audience2
siteretargeting        siteretargeting
data-capture-320x50    data

I know how to str split the Audience column, but looking to add in some type of logic to circumvent the new column being NaN when there is no '-' present in the column

df['Audience2']=df['Audience'].str.split('-').str[1]

This splits the Audience column and only retains the first element but I've been struggling with various if-else and apply-lambda statements to figure out how to pull in data that doesn't have '-' without it being NaN

fmancus1
  • 19
  • 4

5 Answers5

3

Try this:

df['Audience'].str.split('-').str[0].fillna(df['Audience'])

Output:

0    searchretargeting
1                 data
2             purchase
3                 data
Name: Audience, dtype: object
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • 1
    This is smart ~ – BENY Jul 12 '20 at 01:33
  • Nice answer. Though If we are getting the first item `str[0]` than is `fillna` neccessary? I seem to be getting the original value if we use `str[0]` without the usage of `fillna`. – MasayoMusic Jul 12 '20 at 01:49
  • @MasayoMusic The NaN comes from the fact that '-' doesn't appear in the original string, therefore the `split` method returns a NaN, there is no list so getting str[0] is not possible. – Scott Boston Jul 12 '20 at 01:50
2

Let us do fix with ffill

df.Audience.str.split('-',expand=True).ffill(axis=1).iloc[:,1]
0    searchretargeting
1              capture
2             behavior
3              capture
Name: 1, dtype: object

More info

df.Audience.str.split('-',expand=True).ffill(axis=1)
                   0                  1                  2
0  searchretargeting  searchretargeting  searchretargeting
1               data            capture             320x50
2           purchase           behavior            320x500
3               data            capture             728x90
BENY
  • 317,841
  • 20
  • 164
  • 234
1
df['audience2'] = [i.split('-')[0] for i in df.audience if '-' in i else i]

This should do it for you.

unltd_J
  • 476
  • 7
  • 14
1

You could try with np.where:

df['Audience2']=np.where(df.Audience.str.contains('-'), df.Audience.str.split('-').str[0],df.Audience)

Output:

df
                    Audience          Audience2
0          searchretargeting  searchretargeting
1        data-capture-320x50               data
2  purchase-behavior-320x500           purchase
3        data-capture-728x90               data
MrNobody33
  • 6,413
  • 7
  • 19
1

You can instead use something like this -

df['Audience2']=df['Audience'].str.split('-').str[1]
for i in range(len(df)):
  if pd.isna(df['Audience2'][i]):
    df['Audience2'][i] = df['Audience'][i]
Rishit Dagli
  • 1,000
  • 8
  • 20