0

Here is a small sampling of my dataset:

Search_Term    Exit_Page                  Unique_Searches    Exit_Pages_actual
nitrile gloves /store/catalog/product.jsp?        10        /store/catalog/product.jsp?
zytek gloves   /store/product/KT781010            20        /store/pro

So this should be pretty easy, not sure why I am not getting it to work. I am trying to pull into the Exit_Pages_actual column when the all the characters in the Exit_Page when the first 10 characters are "/store/pro" or "/store/cat". When that is not the case, I want it to pull in only the first 10 characters from Exit_Page. As you can see above, my code works fine for the catalog but not for the product (aka works for the first condition in my OR but not the 2nd per the code below). What is wrong? So there is no error message, it just does not gives me the right result for product, only outputs the first 10 characters rather then the whole string:

Exit_Pages['Exit_Pages_actual'] = np.where(Exit_Pages['Exit_Page'].str[:10]==('/store/cat' or '/store/pro'),Exit_Pages['Exit_Page'].str[:],Exit_Pages['Exit_Page'].str[:10])

Exit_Pages  
bernando_vialli
  • 947
  • 4
  • 12
  • 27
  • 1
    the part `'/store/cat' or '/store/pro'` is evaluated to `'/store/cat'`. This is then used in the comparison. Check [operator precedence](https://docs.python.org/3/reference/expressions.html). – akoeltringer Oct 20 '17 at 13:17
  • 3
    Possible duplicate of [Numpy where function multiple conditions](https://stackoverflow.com/questions/16343752/numpy-where-function-multiple-conditions) – akoeltringer Oct 20 '17 at 13:19
  • @ Tw UxTLi51Nus hmmm, thank you for the comment! I have tried a few more things that don't work like this: 'Exit_Pages['Exit_Pages_actual'] = np.where(Exit_Pages['Exit_Page'].str[:10]=='/store/cat' or np.where(Exit_Pages['Exit_Page'].str[:10]=='/store/pro'),Exit_Pages['Exit_Page'].str[:],Exit_Pages['Exit_Page'].str[:10])' and it gives me a ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). – bernando_vialli Oct 20 '17 at 14:08
  • 1
    Take a look at the link provided in "this is a duplicate". `and` and `or` work with `&` and `|` in numpy, pandas,... – akoeltringer Oct 20 '17 at 14:57

1 Answers1

1

@tw-uxtli51nus in the comments is basically correct.

We can accomplish what you want by wrapping logical conditions with () and using '|' in place of 'or'.

So np.where would look like:

df['new_col'] = np.where(
    (
    (df['Exit_Page'].str[:10]=='/store/cat')
    |
    (df['Exit_Page'].str[:10]=='/store/pro')
    )
    ,df['Exit_Page']
    ,df['Exit_Page'].str[:10])

trying to make it more readable since this stuff is ugly to look at.

We can make our lives easier by instead trying a technique similar to what the docs suggest using np.isin(): https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.where.html

but I don't have the correct version of numpy to write out a real example, unfortunately.

Dylan
  • 417
  • 4
  • 14
  • Thank you, this worked! So its my understanding that the only thing you changed was replacing "|" instead of "or"? In what situation do you use one vs the other? I am not sure I fully understand the difference between the two – bernando_vialli Oct 20 '17 at 15:07
  • 1
    To be honest I don't have a good idea! there is https://www.tutorialspoint.com/python/bitwise_operators_example.htm and a loooong answer at https://stackoverflow.com/questions/16343752/numpy-where-function-multiple-conditions that is much better than my answer! – Dylan Oct 20 '17 at 15:20
  • yea I tried reading it but did not fully follow it... But thank you, you made it very clear as to where my error was – bernando_vialli Oct 20 '17 at 15:30