python pandas np.where value from another column

Question

I am trying to apply different value to a df from another column using:

df['url']= 
     np.where(df['client'] == 'xyz',
     "/s?k={query}&s=relevanceblender&page=%s".format(query=df['keyword']), 
     "other")

however query is replaced by all values of df['keyword'], not only the row in question. thanks for your help.

Please read [this](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). It's good practice to provide a toy dataset as example when asking a question. Also see my answer, I provided one. — mozway, Aug 02 '23 at 07:52

score 2 · Accepted Answer · answered Aug 02 '23 at 07:51

2

Assuming this input:

df = pd.DataFrame({'client': ['abc', 'abc', 'xyz', 'xyz'],
                   'keyword': ['kw1', 'kw2', 'kw3', 'kw4']
                  })

You could use:

df['url'] = np.where(df['client'] == 'xyz',
                     df['keyword'].apply("/s?k={}&s=relevanceblender&page=%s".format),
                     'other')

Notice how {query} was changed to {}.

Or, if you cannot change the formatting string:

df['url'] = np.where(df['client'] == 'xyz',
                     df['keyword'].apply(lambda x: "/s?k={query}&s=relevanceblender&page=%s".format(query=x)),
                     'other')

Output:

  client keyword                                  url
0    abc     kw1                                other
1    abc     kw2                                other
2    xyz     kw3  /s?k=kw3&s=relevanceblender&page=%s
3    xyz     kw4  /s?k=kw4&s=relevanceblender&page=%s

answered Aug 02 '23 at 07:51

mozway

194,879
13
39
75

IMO `lambda *args, **kwargs: '...'.format(*args, **kwargs)` would be better, no? – Vitalizzare Aug 02 '23 at 08:14
But actually I can't imagine how to make it work in case of several replacement parameters. Any idea? – Vitalizzare Aug 02 '23 at 08:17
1

@Vitalizzare you could do: `df.rename(columns={'keyword': 'query'}).apply(lambda x: "/s?k={query}&s={client}".format(**dict(x.items())), axis=1)`, not pretty but it works ;) – mozway Aug 02 '23 at 08:20
Or, probably better, using a dictionary comprehension: `pd.Series({k: "/s?k={query}&s={client}".format(**d) for k, d in df.rename(columns={'keyword': 'query'}).to_dict('index').items()})` (or list comprehension if you have duplicated indices). – mozway Aug 02 '23 at 08:23

Vitalizzare · Answer 2 · 2023-08-02T08:09:17.393

1

One of the way to do this is by vectorizing "...".format, like this:

df = pd.DataFrame({
    'A': [1,2,3,4],
    'B': [*'abcd']
})

query_format = np.vectorize('Item {query!r} for {data!r}'.format)

df['C'] = np.where(df['A'] > 2, query_format(query=df['B'], data=df['A']), 'other')

print(df)

with this result:

   A  B               C
0  1  a           other
1  2  b           other
2  3  c  Item 'c' for 3
3  4  d  Item 'd' for 4

See numpy.vectorize for more details.

edited Aug 02 '23 at 08:09

answered Aug 02 '23 at 07:56

Vitalizzare

4,496
7
13
32

Nice idea, however one should keep in mind that `vectorize` is only a convenience, it won't make things faster as string operations cannot really be vectorized ;) – mozway Aug 02 '23 at 08:24

python pandas np.where value from another column

2 Answers2