1

SEE UPDATE AT THE END FOR A MUCH CLEARER DESCRIPTION.

According to http://pandas.pydata.org/pandas-docs/version/0.18.1/generated/pandas.DataFrame.apply.html you can pass external arguments to an apply function, but the same is not true of applymap: http://pandas.pydata.org/pandas-docs/version/0.18.1/generated/pandas.DataFrame.applymap.html#pandas.DataFrame.applymap

I want to apply an elementwise function f(a, i), where a is the element, and i is a manually entered argument. The reason I need that is because I will do df.applymap(f) in a loop for i in some_list.

To give an example of what I want, say I have a DataFrame df, where each element is a numpy.ndarray. I want to extract the i-th element of each ndarray and form a new DataFrame from them. So I define my f:

def f(a, i):
    return a[i]

So that I could make a loop which would return the i-th element of each of the np.ndarray contained in df:

for i in some_series:
    b[i] = df.applymap(f, i=i)

so that in each iteration, it would pass my value of i into the function f.

I realise it would all have been easier if I had used MultiIndexing for df but for now, this is what I'm working with. Is there a way to do what I want within pandas? I would ideally like to avoid for-looping through all the columns in df, and I don't see why applymap doesn't take keyword arguments, while apply does.

Also, the way I currently understand it (I may be wrong), when I use df.apply it would give me the i-th element of each row/column, instead of the i-th element of each ndarray contained in df.


UPDATE:

So I just realised I could split df into Series and then use the pd.Series.apply which could do what I want. Let me just generate some data to show what I mean:

def f(a,i):
    return a[i]

b = pd.Series(index=range(10), dtype=object)
for i in b.index:
    b[i] = np.random.rand(5)

b.apply(f,args=(1,))

Does exactly what I expect, and want it to do. However, trying with a DataFrame:

b = pd.DataFrame(index=range(4), columns=range(4), dtype=object)
for i in b.index:
    for col in b.columns:
        b.loc[i,col] = np.random.rand(10)

b.apply(f,args=(1,))

Gives me ValueError: Shape of passed values is (4, 10), indices imply (4, 4).

Marses
  • 1,464
  • 3
  • 23
  • 40

3 Answers3

3

You can use it:

def matchValue(value, dictionary):
    return dictionary[value]

a = {'first':  1, 'second':  2}
b = {'first': 10, 'second': 20}
df['column'] = df['column'].map(lambda x: matchValue(x, a))
user3876608
  • 111
  • 10
2

This is a solution where argument is stored within a nested method

f(cell,argument):
    """Do something with cell value and argument"""
    return output

def outer(argument):
   def inner(cell):
        return f(cell,argument)

   return inner 

argument = ...
df.applymap(func = outer(argument))
Martin Alexandersson
  • 1,269
  • 10
  • 12
0

Pandas applymap doesn't accept arguments, DataFrame.applymap(func). If you want to maintain an i as state, you can store it as a global variable that's accessed/modified by func, or use a decorator.

However, I would recommend you to try the apply method.

Neo X
  • 947
  • 7
  • 9
  • See the update. Is there a way to make the apply function do what I want? I don't really understand the error it's giving me (there's a ton of text), but I assumed it's trying to return the `i`-th row of `b`, instead of the `i`-element of each element of b. – Marses Feb 15 '17 at 22:11
  • Do you want to use `f` on a list or series, or on a 2D dataframe? Pandas `apply` applies function along input axis of DataFrame. And `applymap` apply a function to a DataFrame that is intended to operate elementwise, i.e. like doing map(func, series) for each series in the DataFrame. – Neo X Feb 15 '17 at 22:11
  • Essentially I would like the functionality of applymap (so apply `func` on each element of `df`/`b`), while being able to pass my "external" argument `i` into `func`. As you said, it seems I could use global variables or perhaps function attributes or something, or just split `df` into Series, but I was just wondering if there was a way to do that directly within pandas. – Marses Feb 15 '17 at 22:14
  • It depends on how do you define your `i-th` element of a 2D array? If it is `i = row * n_col + col`, pandas doesn't have a direct way for that, but you may consider using `apply` twice or [flattening the dataframe to a list first](http://stackoverflow.com/questions/25440008/python-pandas-flatten-a-dataframe-to-a-list). – Neo X Feb 15 '17 at 22:21