PySpark WithColumn on multiple columns with udf

Question

I have a function in PySpark that takes two input and returns two output:

def get_seen_cards(x, y):
    if 1 in x:
        alreadyFailed = 1
    else:
        alreadyFailed = 0
    if y:
        alreadyAuthorized = 1
    else:
        alreadyAuthorized = 0
return alreadyFailed, alreadyAuthorized

And I want to apply this function with udf to get the whole dataframe treated like this:

get_seen_cards_udf = udf(lambda x, y : get_seen_cards_spark(x, y), IntegerType())

data.withColumn(["alr_failed", "alr_auth"], get_seen_cards_udf(data["card_uid"], data["failed"]))

Where data["card_uid"] looks like this :

[Row(card_uid='card_1'),
 Row(card_uid='card_2'),
 Row(card_uid='card_3'),
 Row(card_uid='card_4'),
 Row(card_uid='card_5')]

and data["failed"] look like this :

[Row(failed=False),
 Row(failed=False),
 Row(failed=False),
 Row(failed=True),
 Row(failed=False)]

But this is obvioulsy not working because withColumn only works for one column.

I need to add two columns at the same time in my dataframe, the first is the results of the first return of the function and will be stored in "alr_failed" and the other column is the second value of the return and will be store in "alr_auth".

The idea is to return a dataframe with the following columns after treatment :

card_uid, failed, alr_failed, alr_auth

Is it even possible someway ? Or is there's a workaround for this ?

I am not sure I understand, how do you decide which column of your data goes into which argument? Do you want it for every combination of two columns? It would help to share actual data and code to illustrate your problem. — mtoto, Jan 04 '19 at 10:08
Well, the return from "a" goes to "success" and the return from "b" goes to "failed". I don't know if I'm clear so I'll update my post now — LaSul, Jan 04 '19 at 10:13
This is edited, don't hesitate to tell me if you still don't understand — LaSul, Jan 04 '19 at 10:25
I think the logical thing would be to split your function into two separate functions for each column type. — mtoto, Jan 04 '19 at 10:27
Yes I think too but I was wondering is there were a possibility to do it otherwise — LaSul, Jan 04 '19 at 11:04
Could create an `Array` column and split it afterwards perhaps. — mtoto, Jan 04 '19 at 11:07

PySpark WithColumn on multiple columns with udf

0 Answers0