I have a pyspark df which has many columns but a subset looks like this:
datetime | eventid | sessionid | lat | lon | filtertype |
---|---|---|---|---|---|
someval | someval | someval | someval | someval | someval |
someval | someval | someval | someval | someval | someval |
I want to map a function some_func() which only makes use of the columns 'lat', 'lon' and 'event_id' to return a Boolean value which would be added to the df as a separate column named 'verified'. Basically I need to retrieve the columns of interest inside the function separately and do my operations on them. I know I can use UDFs or df.withColumn() but they are used to map to single column. For that I need to concatenate columns of interest as one column which would make the code a bit messy.
Is there a way to retrieve the column values inside the function separately and map that function to the entire dataframe? (similar to what we can do with Pandas df using map-lambda & df.apply())?