1

I have two files :- orders_renamed.csv , customers.csv I am joining them with full outer join and then dropping same column (customer_id). I want to replace null vaue to "-1" in "order_id" column.

I have tried this:

from pyspark.sql.functions import regexp_extract, monotonically_increasing_id, unix_timestamp, from_unixtime, coalesce from pyspark.sql.types import IntegerType, StructField, StructType, StringType

ordersDf = spark.read.format("csv").option("header", True).option("inferSchema", True).option("path", "C:/Users/Lenovo/Desktop/week12/week 12 dataset/orders_renamed.csv").load()

customersDf = spark.read.format("csv").option("header", True).option("inferSchema", True).option("path", "C:/Users/Lenovo/Desktop/week12/week 12 dataset/customers.csv").load()

joinCondition1 = ordersDf.customer_id == customersDf.customer_id

joinType1 = "outer"   


joinenullreplace = ordersDf.join(customersDf, joinCondition1, joinType1).drop(ordersDf.customer_id).select("order_id", "customer_id", "customer_fname").sort("order_id").withColumn("order_id",coalesce("order_id",-1))


joinenullreplace.show(50) 

as in last line i have used coalesce but it is giving me error..i have tried multiple ways like treting coalesce as one expression and applying 'expr' but it did not work. I have also used lit but it did not work. please reply solution.

  • if the `order_id` column is of string type, you'll need to pass a string column or literal in `coalesce`. if it's a literal, enclose the value in `lit()`. – samkart Jul 31 '23 at 06:32
  • I have used lit but it is also giving me error . .withColumn("order_id",coalesce("order_id",lit(1))) what import i have to use for this lit? – Vivek Mishra Jul 31 '23 at 07:40
  • it is a [pyspark function](https://spark.apache.org/docs/3.3.0/api/python/reference/pyspark.sql/functions.html). same import as `col()`. – samkart Jul 31 '23 at 07:47

1 Answers1

0
from pyspark.sql.functions import lit
toyota Supra
  • 3,181
  • 4
  • 15
  • 19