-2

I have a dataframe and I have written the following function to populate a new column:

df = pd.DataFrame(np.random.randn(10, 2), columns=['a', 'b'])

def perc(a,b):

    if a/b < 0:
        n = 0
    elif a/b > 1:
        n = 1
    else:
        n = a/b
    return n

df['c']=perc(df['a'],df['b'])

df[1:10]

It's supposed to calculate a percent column. Here is the error I am getting:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I understand that it has to do with dif and unc being series instead of individual elements. But how do I fix it?

Mateyobi
  • 123
  • 2
  • 13
  • What's the behaviour you actually want? Give an example input and expected output that demonstrates all the logic you're trying to capture. Here's my guess, if `dif` were the series `[-1, 1, 3, 5]` and `unc` were the series `[2, 2, 3, 3]` then `dif/unc` would be `[-0.5, 0.5, 1, 1.6666]` and you would want to return `[0, 0.5, 1, 1]`, is that correct? – Amit Kumar Gupta Nov 10 '15 at 03:59
  • I need column 'C' to show a unique value for the a/b of that row. But if its negative it should be 0%, or if its over 100% it should show 1. – Mateyobi Nov 10 '15 at 04:03
  • Hey yes you got it. You must have edited it after I responded. Note that I edited my OP to be more generic a/b. – Mateyobi Nov 10 '15 at 04:05

2 Answers2

0

What you're actually asking for is a bit hard to describe in words, but the following example captures it:

If a is the series [-1, 1, 3, 5] and b is [2, 2, 3, 3], then a/b will be a series like [-0.5, 0.5, 1, 1.6666667], and what you ultimately want to return is [0, 0.5, 1, 1].

You can "cap values at 1" for a series by taking the minimum of that series with the series of all ones. Similar, you can ensure nothing is below 0 by taking the maximum of a series with the series of all zeroes. numpy lets you do this easily:

def perc(a,b):
    length = len(a)
    return np.maximum(np.minimum(np.ones(length), a/b), np.zeros(length))
Amit Kumar Gupta
  • 17,184
  • 7
  • 46
  • 64
  • This is going to help no-one who finds this question on google. Dang, shoulda dupe closed it. e.g. of http://stackoverflow.com/q/21415661/1240268 – Andy Hayden Nov 10 '15 at 04:20
  • @AndyHayden how would you apply the answer in your link to my question? This is my first question, I wouldn't mind if you undid your down vote. – Mateyobi Nov 10 '15 at 04:27
0

There is a built-in method for this clip:

In [134]:
df = pd.DataFrame(np.random.randn(10, 2), columns=['a', 'b'])
df

Out[134]:
          a         b
0  0.676248 -0.320346
1 -1.344982  2.170232
2 -0.150036 -1.606179
3  0.350467  0.386958
4  0.551379 -0.378882
5 -0.283632 -1.559516
6  0.266356 -0.859321
7  0.188118  1.275342
8  0.109570  0.546783
9  0.917231 -0.339878

In [136]:
df['c'] = (df['a']/df['b']).clip(lower=0, upper=1)
df

Out[136]:
          a         b         c
0  0.676248 -0.320346  0.000000
1 -1.344982  2.170232  0.000000
2 -0.150036 -1.606179  0.093412
3  0.350467  0.386958  0.905699
4  0.551379 -0.378882  0.000000
5 -0.283632 -1.559516  0.181872
6  0.266356 -0.859321  0.000000
7  0.188118  1.275342  0.147504
8  0.109570  0.546783  0.200390
9  0.917231 -0.339878  0.000000
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • This is a better solution. But is there a way to process 1 element at a time? What if I want to populate column c based on a more complicated formula? Can I not do it the way I tried using data.frame? I am used to access functions that process one row at a time. – Mateyobi Nov 11 '15 at 05:26
  • You can do that but to me this defeats the whole point of using pandas which provides vectorised methods, if you're going to do that then use `apply` to process an element – EdChum Nov 11 '15 at 09:17