2

I am trying to prepare some data for a heatmap or 3D plot. The general idea is that I have some function z=f(x,y) where z is the value of a specific cell with x as its column value and y as its index value.

My current approach is to loop over the dataframe which already shows the desired result:

import numpy as np
import pandas as pd


def my_fun(a, b):
    return(a**2 + b**3)

index = [i for i in np.arange(25.0, 100.0, 25.0)]
columns = [i for i in np.arange(150.0, 600.0, 150.0)]
df = pd.DataFrame(np.zeros((3, 3)), index=index, columns=columns)

for idx in index:
    for col in columns:
    df.loc[idx, col] = my_fun(idx, col)

print(df)

and yields:

      150.0       300.0       450.0
25.0  3375625.0  27000625.0  91125625.0
50.0  3377500.0  27002500.0  91127500.0
75.0  3380625.0  27005625.0  91130625.0

But looping over the dataframe is probably not the right (vectorized) way to deal with this problem and I was looking for some pretty combination of apply/applymap/map.

Is there any way to get the same result in a smarter/vectorized way?

Thanks in advance!

Community
  • 1
  • 1
Cord Kaldemeyer
  • 6,405
  • 8
  • 51
  • 81

2 Answers2

4

You can use:

#if need only some easy arithmetic operation like sum
print (df.apply(lambda x: x.index + x.name, axis=1))
   1  2  3
1  2  3  4
2  3  4  5
3  4  5  6

If need your function working with scalars, is possible stack for Series, convert to df, apply function and last unstack:

df1 = df.stack().to_frame().apply(lambda x: my_fun(x.name[0], x.name[1]), axis=1).unstack()
print (df1)
   1  2  3
1  2  3  4
2  3  4  5
3  4  5  6

For testing is best instead lambda use some custom function like:

def f(x):
    print (x.name)
    print (x.index)
    return x.index + x.name
1
Int64Index([1, 2, 3], dtype='int64')
1
Int64Index([1, 2, 3], dtype='int64')
2
Int64Index([1, 2, 3], dtype='int64')
3
Int64Index([1, 2, 3], dtype='int64')

print (df.apply(f, axis=1))

   1  2  3
1  2  3  4
2  3  4  5
3  4  5  6
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

Indeed you can simply leverage the apply function to operate column-wise knowing that the column index is always available because the column is a pandas.Series:

import numpy as np
import pandas as pd


def my_fun(col):
    # both are numpy arrays, col.values gives the inner value of the whole column
    # operations here use the fast numpy primitives
    return col.index + col.values  

index = [i for i in range(1, 4)]
columns = ['col' + str(i) for i in range(1, 4)]
df = pd.DataFrame(np.random.randint(1, 10, (3, 3)), index=index, columns=columns)

col_names = ['col1', 'col2']  # alternatively you can use an array of columns indices such as [1, 2]
df[col_names].apply(my_fun)
print(df)
Kirell
  • 9,228
  • 4
  • 46
  • 61
  • I think this only works if I want to calculate the cell values based on the former value and the index value but not based on the column value. Maybe my question wasn't formulated clearly. I have adapted the code! – Cord Kaldemeyer Feb 20 '17 at 15:51