19

I'm trying to clean up some code in Python to vectorize a set of features and I'm wondering if there's a good way to use apply to pass multiple arguments. Consider the following (current version):

def function_1(x):
    if "string" in x:
        return 1
    else:
        return 0

df['newFeature'] = df['oldFeature'].apply(function_1)

With the above I'm having to write a new function (function_1, function_2, etc) to test for each substring "string" that I want to find. In an ideal world I could combine all of these redundant functions and use something like this:

def function(x, string):
    if string in x:
        return 1
    else:
        return 0

df['newFeature'] = df['existingFeature'].apply(function("string"))

But trying that returns the error TypeError: function() takes exactly 2 arguments (1 given) Is there another way to accomplish the same thing?

Edit:

def function(string, x):
    if string in x:
        return 1
    else:
        return 0

df['newFeature'] = df['oldFeature'].apply(partial(function, 'string'))
mvwi
  • 243
  • 1
  • 2
  • 7

1 Answers1

14

I believe you want functools.partial. A demo:

>>> from functools import partial
>>> def mult(a, b):
...     return a * b
...
>>> doubler = partial(mult, 2)
>>> doubler(4)
8

In your case you need to swap arguments in function (because of idea of partial), and then just

df['existingFeature'].apply(partial(function, "string"))
Roman Bodnarchuk
  • 29,461
  • 12
  • 59
  • 75
  • 1
    Or he could simply use a `lambda`: `doubler = lambda b: mult(b, 2)`. With a `lambda` you can fix whatever values you want, while `partial` can only fix positional arguments as they are specified. – Bakuriu Oct 02 '13 at 15:31
  • In many cases, you can specify keyword arguments: `partial(mult, a=5)` or `partial(mult, b=5)`. (This does require you to know the names of the arguments, which may not be documented.) – chepner Oct 02 '13 at 15:50