2

I am trying to apply different value to a df from another column using:

df['url']= 
     np.where(df['client'] == 'xyz',
     "/s?k={query}&s=relevanceblender&page=%s".format(query=df['keyword']), 
     "other")

however query is replaced by all values of df['keyword'], not only the row in question. thanks for your help.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
Vaidas
  • 55
  • 6
  • Please provide a reproducible example of `df` – mozway Aug 02 '23 at 07:44
  • didn't understand your question – appu Aug 02 '23 at 07:48
  • Please read [this](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). It's good practice to provide a toy dataset as example when asking a question. Also see my answer, I provided one. – mozway Aug 02 '23 at 07:52

2 Answers2

2

Assuming this input:

df = pd.DataFrame({'client': ['abc', 'abc', 'xyz', 'xyz'],
                   'keyword': ['kw1', 'kw2', 'kw3', 'kw4']
                  })

You could use:

df['url'] = np.where(df['client'] == 'xyz',
                     df['keyword'].apply("/s?k={}&s=relevanceblender&page=%s".format),
                     'other')

Notice how {query} was changed to {}.

Or, if you cannot change the formatting string:

df['url'] = np.where(df['client'] == 'xyz',
                     df['keyword'].apply(lambda x: "/s?k={query}&s=relevanceblender&page=%s".format(query=x)),
                     'other')

Output:

  client keyword                                  url
0    abc     kw1                                other
1    abc     kw2                                other
2    xyz     kw3  /s?k=kw3&s=relevanceblender&page=%s
3    xyz     kw4  /s?k=kw4&s=relevanceblender&page=%s
mozway
  • 194,879
  • 13
  • 39
  • 75
  • IMO `lambda *args, **kwargs: '...'.format(*args, **kwargs)` would be better, no? – Vitalizzare Aug 02 '23 at 08:14
  • But actually I can't imagine how to make it work in case of several replacement parameters. Any idea? – Vitalizzare Aug 02 '23 at 08:17
  • 1
    @Vitalizzare you could do: `df.rename(columns={'keyword': 'query'}).apply(lambda x: "/s?k={query}&s={client}".format(**dict(x.items())), axis=1)`, not pretty but it works ;) – mozway Aug 02 '23 at 08:20
  • Or, probably better, using a dictionary comprehension: `pd.Series({k: "/s?k={query}&s={client}".format(**d) for k, d in df.rename(columns={'keyword': 'query'}).to_dict('index').items()})` (or list comprehension if you have duplicated indices). – mozway Aug 02 '23 at 08:23
1

One of the way to do this is by vectorizing "...".format, like this:

df = pd.DataFrame({
    'A': [1,2,3,4],
    'B': [*'abcd']
})

query_format = np.vectorize('Item {query!r} for {data!r}'.format)

df['C'] = np.where(df['A'] > 2, query_format(query=df['B'], data=df['A']), 'other')

print(df)

with this result:

   A  B               C
0  1  a           other
1  2  b           other
2  3  c  Item 'c' for 3
3  4  d  Item 'd' for 4

See numpy.vectorize for more details.

Vitalizzare
  • 4,496
  • 7
  • 13
  • 32
  • Nice idea, however one should keep in mind that `vectorize` is only a convenience, it won't make things faster as string operations cannot really be vectorized ;) – mozway Aug 02 '23 at 08:24