2

I would like to iterate a given function in a pandas dataframe without using a for loop, i.e using vectorisation.

I have already written a for loop for this function but I would like to improve the efficiency of this output.

def f(x,y,operation):
    if operation=='add':
        return x+y
    elif operation=='power':
        return x**y
    else:
        print('type can only be power or add')
df = pd.DataFrame({
              'first_entry':[1,np.nan,np.nan,np.nan,np.nan],
              'operation':[np.nan,'plus','power','plus','plus'],
              'operand':[np.nan,3,2,4,1]
              })
first_entry operation operand       expected_result
1           NA        NA            1
NA          plus      3             4 (= 1+3)
NA          power     2             16 (=4**2)
NA          plus      4             20 (=16+4)
NA          plus      1             21 (=20+1)

I want to return pd.Series(1,4,16,20,21), i.e. iterate f over the dataframe

Alternative question: Suppose now

def g(x,y,operation):
    if operation=='relative':
        return x*(1+y)
    elif operation=='absolute':
        return x+y
    else:
        print('type can only be relative or absolute')

Can I write a function with list comprehension to give the expected result?

first_entry operation operand       expected_result
1           NA            NA            1
NA          relative      3             4 (= 1*(3+1)
NA          absolute      2             6 (=4+2)
NA          relative      4             30 (=6*(4+1)
NA          absolute      1             31 (=30+1)
  • Are you looking for [apply](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html)? – tzaman Sep 10 '19 at 14:57
  • Apply will not work iteratively. I know that, for example, .cumsum() is iterative, so I want to iterate with a different function effectively. – Samuel Bodansky Sep 10 '19 at 15:01
  • @rafaelc is it possible using list comprehension? is cumsum() not a vectorisable function which depends on the value of the previous computation – Samuel Bodansky Sep 10 '19 at 15:26
  • I agree with @rafaelc, this is hard to vectorize because of the updated value for each iteration. If you are interested in a _non vectorized_ solution, I can post one. – Erfan Sep 10 '19 at 15:28
  • in my actual example, the function f(a,b,operation) returns a+b for operation==operation1, and a*(1+b) for operation==operation2. Do you think this can be done with list comprehension, and if so, how? – Samuel Bodansky Sep 10 '19 at 15:32
  • I think the only *major* gain you may get here is using `numba`, if that's even possible for such calculations. I have no experience, so just guessing but that tends to works for simple iterative calculations like in https://stackoverflow.com/questions/56904390/restart-cumsum-and-get-index-if-cumsum-more-than-value – ALollz Sep 10 '19 at 16:23

1 Answers1

0

I don't get the relationship between a, b, c directly. But you could use Pandas apply function Apply or Apply Map?

On a very high level, have something like:

def f(row):
    if row["type"] == "add":
        return row["a"] + row["b"]
    elif row["type"] == "power":
        return row["a"] ** row["b"]

df["res"] = df.apply(f, axis=1)

This assumes your columns are named "a", "b", and "type" respectively.

amlaanb
  • 11
  • 3