I have a dataframe like below
+--+--+-----------+
| a| b| date|
+--+--+-----------+
| 1| 2| 01/01/2022|
| 2| 3| 01/01/2021|
| 3| 4| 12/20/2021|
+--+--+-----------+
I have tried the code below but it keeps showing the 01/01/2022 date even though 30/12/2021 is not greater than 01/01/2022.
df.filter(("30/12/2021" > col("date"))
I have tried casting both to dates and it returns 0 records then.
df.filter("cast(StartDate as date) >= cast('2017-02-03' as date)")
Below is sample code
from pyspark.shell import spark
from pyspark.sql.functions import col
from pyspark.sql.types import StructType, StructField, StringType, IntegerType
data2 = [(1, 2,"01/01/2022"),
(1, 3,"01/01/2021"),
(2, 4,"12/20/2021"),
]
schema = StructType([ \
StructField("a", IntegerType(), True), \
StructField("b", IntegerType(), True), \
StructField("date", StringType(), True), \
])
df = spark.createDataFrame(data=data2, schema=schema)
df.filter(("30/12/2021" > col("date"))).show()