1

I'm trying to aggregate a dataset in which one of the columns contains some URLs. Consider the following dataset

import pandas as pd

df = pd.DataFrame({"ID": [1, 1, 1, 2, 2], 
                   "Website": ["https://www.auctionbid.com",
                               "https://www.google.com",
                               "https://www.awesomeauctions.net",
                               "https://www.awesomeauctions.net",
                               "http://www.auctionnoitcua.com"
                              ]
                 })

I would like to perform the following analysis:

(
df
.groupby("ID")
.agg({"Website": lambda x: 
      "; ".join([site for site in x if x.str.contains("auction")])
    })
)

This results in a ValueError stating that the truth value of a Series is ambiguous. The accepted answer of this question states that if can implicitly convert the operands to bool, and suggests using "bitwise" operators.

My question, then, is how do I implement the equivalent of & and | for if?

Community
  • 1
  • 1
tblznbits
  • 6,602
  • 6
  • 36
  • 66
  • `x` is a Series (for each ID you have a different Series in agg). pandas doesn't know whether you want to join if all the items contain that word, or if any of them would suffice. I also don't know how you want to join but if you want to join only the websites which contain the string auction, then instead of a condition on x, just change the comprehension to `site for site in x if 'auction' in site` – ayhan Feb 22 '17 at 15:39
  • 1
    @ayhan Thank you for this explanation, it definitely helps explain what was going wrong. I was interpreting the procedure as looping through each value of `x` and selecting only those where "auction" was found. I now understand what's actually happening. – tblznbits Feb 22 '17 at 16:49

2 Answers2

3

You can use pandas builtin pd.Series.str.contains and pd.Series.str.cat methods to explicitly accomplish this:

join_func = lambda x: x[x.str.contains("auction")].str.cat(sep="; ")
df.groupby("ID").agg({"Website": join_func})
pansen
  • 6,433
  • 4
  • 19
  • 32
2

Your comparison cannot work because x is the whole series and not just the item you are processing. This works:

df.groupby("ID")['Website'].agg(lambda x: "; ".join([site for site in x.values if "auction" in site]))
languitar
  • 6,554
  • 2
  • 37
  • 62