3

I'm using Pandas to drive a Python function. From inputs.csv, I use each row in "Column A" as an input for the function.

In the csv, there is also a "Column B" that contains values that I want to read into a variable x within the function. It should not apply from "Column B" – this should still be done from "Column A". Is this possible?


This is the current code that applies the function from "Column A":

import pandas as pd
df = pd.read_csv(inputs.csv, delimiter=",")

def function(a):
    #variables c, d, e are created here
    ###I would like to create x from Column B if possible
    return pd.Series([c, d, e])
df[["Column C", "Column D", "Column E"]] = df["Column A"].apply(function)

Post-edit: This question has been identified as a possible duplicate of another question. Although the answer may be the same, the question is not the same. For future readers it is probably not apparent that apply on two columns is interchangeable with apply on one column and "reading" another column at the same time. The question should therefore remain open.

Community
  • 1
  • 1
P A N
  • 5,642
  • 15
  • 52
  • 103
  • Not sure if I get your question right, but maybe what you want is simply `df[(['Column A', 'Column B'])].apply(scrape, axis=1)`, this means you get still just one argument to your function `scrape`, but that argument is a tuple of the values of `Column A` and `Column B` – firelynx Aug 05 '15 at 07:22
  • @firelynx Thanks for your reply. I don't need to combine the columns in any way – `Column A` still holds the keyword for the argument. I only need to read `Column B` and pass it to a variable. If I run the argument by `axis=1`, how do I select `Column A` for `def function(a)` from the tuple? – P A N Aug 05 '15 at 07:37
  • inside `def function(a):` you would simply need to have one row like this: `col_a, col_b = a` – firelynx Aug 05 '15 at 07:40
  • Made an answer with a bit more clairty. – firelynx Aug 05 '15 at 07:46
  • Where do you want to "create x"? Should it end up in the dataframe or just live inside the function? Should x be a Series or just a variable? – firelynx Aug 05 '15 at 08:37
  • @firelynx It just needs to live inside the funcition as a variable :) I will use it simply to `print` the contents, so it doesn't need to be `returned`. – P A N Aug 05 '15 at 09:13
  • possible duplicate of [How to apply a function to two columns of Pandas dataframe](http://stackoverflow.com/questions/13331698/how-to-apply-a-function-to-two-columns-of-pandas-dataframe) – LondonRob Aug 06 '15 at 12:37

2 Answers2

2

Yes, you are currently using Series.apply() , instead you can use - DataFrame.apply(), with axis=1 to get each row in the function , then you can access the columns as - row[<column>].

Example -

In [37]: df
Out[37]:
   X  Y  Count
0  0  1      2
1  0  1      2
2  1  1      2
3  1  0      1
4  1  1      2
5  0  0      1

In [38]: def func1(r):
   ....:     print(r['X'])
   ....:     print(r['Y'])
   ....:     return r
   ....:

In [39]: df.apply(func1,axis=1)
0
1
0
1
1
1
1
0
1
1
0
0
Out[39]:
   X  Y  Count
0  0  1      2
1  0  1      2
2  1  1      2
3  1  0      1
4  1  1      2
5  0  0      1

This is just a very simple example, you can modify this to what you really want to do.

Anand S Kumar
  • 88,551
  • 18
  • 188
  • 176
  • Thanks for your reply, it works now! My only issue is that it reads the top row 2 times before moving on to the next. Then there is no more repetitions. Do you have any idea why (if it has anything to do with the `DataFrame.apply()` vs `Series.apply()`. – P A N Aug 05 '15 at 15:47
  • No, it does not have anything to do with `Series.apply` vs `DataFrame.apply` . Are you sure the dataframe itself does not have same row twice at top? Check my example , its not reading first row twice (its just that first and second row are identical) . – Anand S Kumar Aug 05 '15 at 15:51
  • 1
    To cross check, try printing the `index` of the row to see if its actually the same row twice. – Anand S Kumar Aug 05 '15 at 15:53
  • I checked the index and it's correct. The problem must lie in how I pass on the argument to the function. (I am not using your literal example). Thanks! – P A N Aug 05 '15 at 15:56
1

The axis=1 argument passed to the apply method puts the whole row into the apply method as a single tuple argument.

This is however a lot slower than applying with a single column. I would advice against this if performance is an issue.

def scrape(x):
    a, b = x
    # Magically create c, d, e from a
    print(b)
    return pd.Series([c, d, e])

df[["Column C", "Column D", "Column E"]] = df[(['Column A', 'Column B'])].apply(scrape, axis=1)
firelynx
  • 30,616
  • 9
  • 91
  • 101
  • Thanks for your reply! I'm not sure if I understand correctly, I have probably been unclear – `a` and `b` is not the same nor should be combined. What I would like to do is use `a` as the argument for the function, as in my original code. `b` should only be passed on to `x` as `x = b`. So if I run `def function(a)`, how can I include `b` in that function? – P A N Aug 05 '15 at 07:54
  • @Winterflags Updated my answer – firelynx Aug 05 '15 at 09:50
  • Thanks for your help! I got a strange JSON error with your solution, `is not JSON serializable`. I am sure it would've worked but I wasn't able to solve it atm. – P A N Aug 05 '15 at 11:21