I want to create a new column based on some condition in pyspark. My data frame -
id create_date txn_date
1 2019-02-23 23:27:42 2019-08-18 00:00:00
2 2019-08-24 00:10:18 2019-08-24 00:00:00
3 2019-09-16 17:47:56 2018-07-23 00:00:00
4 2019-09-24 01:31:21 2018-05-13 00:00:00
5 2018-12-26 23:28:09 2019-07-15 00:00:00
All the columns are in string format. My condition is -
txn_date >= create_date. Based on this condition i will create a new column 'is_mem'.
My final data frame looks like -
id create_date txn_date is_mem
1 2019-02-23 23:27:42 2019-08-18 00:00:00 0
2 2019-08-24 00:10:18 2019-09-24 00:00:00 1
3 2019-09-16 17:47:56 2018-07-23 00:00:00 1
4 2019-09-24 01:31:21 2018-05-13 00:00:00 1
5 2018-12-26 23:28:09 2019-07-15 00:00:00 0
How to do it in pyspark?