I have two files :- orders_renamed.csv , customers.csv I am joining them with full outer join and then dropping same column (customer_id). I want to replace null vaue to "-1" in "order_id" column.
I have tried this:
from pyspark.sql.functions import regexp_extract, monotonically_increasing_id, unix_timestamp, from_unixtime, coalesce from pyspark.sql.types import IntegerType, StructField, StructType, StringType
ordersDf = spark.read.format("csv").option("header", True).option("inferSchema", True).option("path", "C:/Users/Lenovo/Desktop/week12/week 12 dataset/orders_renamed.csv").load()
customersDf = spark.read.format("csv").option("header", True).option("inferSchema", True).option("path", "C:/Users/Lenovo/Desktop/week12/week 12 dataset/customers.csv").load()
joinCondition1 = ordersDf.customer_id == customersDf.customer_id
joinType1 = "outer"
joinenullreplace = ordersDf.join(customersDf, joinCondition1, joinType1).drop(ordersDf.customer_id).select("order_id", "customer_id", "customer_fname").sort("order_id").withColumn("order_id",coalesce("order_id",-1))
joinenullreplace.show(50)
as in last line i have used coalesce but it is giving me error..i have tried multiple ways like treting coalesce as one expression and applying 'expr' but it did not work. I have also used lit but it did not work. please reply solution.