-1

I would like to add a new column retailer_relationship, to my dataframe.

I would like each row value of this new column to be 'TRUE' if the retailer column value starts with any items within the list retailer_relationship, and 'FALSE' otherwise.

What I've tried:

list_of_relationships = ("retailer1","retailer2","retailer3")

for i in df.index:
    for relationship in list_of_relationships:            
        if df.iloc[i]['retailer'].str.startswith(relationship):
            df.at[i, 'retailer_relationship'] = "TRUE"
        else:
            df.at[i, 'retailer_relationship'] = "FALSE"
gmds
  • 19,325
  • 4
  • 32
  • 58
Deskjokey
  • 568
  • 1
  • 7
  • 18
  • Possible duplicate of [Pandas conditional creation of a series/dataframe column](https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column) – cwalvoort May 16 '19 at 02:38

3 Answers3

2

You can use a regular expression combining the ^ special character, which specifies the beginning of the string, with another regex matching every element of retailer_relationship, since startswith does not accept regexes:

import re

regex = re.compile('^' + '|'.join(list_of_relationships))

df['retailer_relationship'] = df['retailer'].str.contains(regex).map({True: 'TRUE', False: 'FALSE'})

Since you want the literal strings 'TRUE' and 'FALSE', we can then use map to convert the booleans to strings.

An alternative method that is slightly faster, though I don't think that'll matter:

df['retailer_relationship'] = df['retailer'].str.contains(regex).transform(str).str.upper()
gmds
  • 19,325
  • 4
  • 32
  • 58
  • I'm getting: TypeError: can only join an iterable (on the regex line) – Deskjokey May 16 '19 at 02:44
  • @Deskjokey Did you run `retailer_relationship = ("retailer1","retailer2","retailer3")` before that? Actually, why do you call it `retailer_relationship` when you iterate through `list_of_relationships`? – gmds May 16 '19 at 02:44
  • It works. Just had to change: regex = re.compile('^' + '|'.join(list_of_relationships)) – Deskjokey May 16 '19 at 02:47
  • 1
    @Deskjokey Yup, I wrote my answer based on the original question. I suggest you edit it to change the reference to `list_of_relationships`. – gmds May 16 '19 at 02:48
0

See if this works for you. It would help to share a sample of your df or a dummy data representing it.

df.loc['retailer_relationship'] = False
df.loc[df['retailer'].isin(retailer_relationship),'retailer_relationship'] = True
Vasu Devan
  • 176
  • 6
0

You still can using startswith in pandas

df['retailer_relationship'] = df['retailer'].str.startswith(tuple(retailer_relationship))
BENY
  • 317,841
  • 20
  • 164
  • 234