How to replace null value with some value using coalesce in pyspark

Question

I have two files :- orders_renamed.csv , customers.csv I am joining them with full outer join and then dropping same column (customer_id). I want to replace null vaue to "-1" in "order_id" column.

I have tried this:

from pyspark.sql.functions import regexp_extract, monotonically_increasing_id, unix_timestamp, from_unixtime, coalesce from pyspark.sql.types import IntegerType, StructField, StructType, StringType

ordersDf = spark.read.format("csv").option("header", True).option("inferSchema", True).option("path", "C:/Users/Lenovo/Desktop/week12/week 12 dataset/orders_renamed.csv").load()

customersDf = spark.read.format("csv").option("header", True).option("inferSchema", True).option("path", "C:/Users/Lenovo/Desktop/week12/week 12 dataset/customers.csv").load()

joinCondition1 = ordersDf.customer_id == customersDf.customer_id

joinType1 = "outer"   


joinenullreplace = ordersDf.join(customersDf, joinCondition1, joinType1).drop(ordersDf.customer_id).select("order_id", "customer_id", "customer_fname").sort("order_id").withColumn("order_id",coalesce("order_id",-1))


joinenullreplace.show(50)

as in last line i have used coalesce but it is giving me error..i have tried multiple ways like treting coalesce as one expression and applying 'expr' but it did not work. I have also used lit but it did not work. please reply solution.

if the `order_id` column is of string type, you'll need to pass a string column or literal in `coalesce`. if it's a literal, enclose the value in `lit()`. — samkart, Jul 31 '23 at 06:32
I have used lit but it is also giving me error . .withColumn("order_id",coalesce("order_id",lit(1))) what import i have to use for this lit? — Vivek Mishra, Jul 31 '23 at 07:40
it is a [pyspark function](https://spark.apache.org/docs/3.3.0/api/python/reference/pyspark.sql/functions.html). same import as `col()`. — samkart, Jul 31 '23 at 07:47

score 0 · Answer 1 · edited Aug 02 '23 at 13:22

0

from pyspark.sql.functions import lit

edited Aug 02 '23 at 13:22

toyota Supra

3,181
4
15
19

answered Jul 31 '23 at 10:32

user22113674

1
1

3

it'd be good to explain why and how your single line would help OP. – samkart Aug 01 '23 at 11:12

How to replace null value with some value using coalesce in pyspark

1 Answers1