Assuming that I have the following data
+--------------------+-----+--------------------+
| values|count| values2|
+--------------------+-----+--------------------+
| aaaaaa| 249| null|
| bbbbbb| 166| b2|
| cccccc| 1680| something|
+--------------------+-----+--------------------+
So if there is a null value in values2
column how to assign the values1
column to it? So the result should be:
+--------------------+-----+--------------------+
| values|count| values2|
+--------------------+-----+--------------------+
| aaaaaa| 249| aaaaaa|
| bbbbbb| 166| b2|
| cccccc| 1680| something|
+--------------------+-----+--------------------+
I thought of something of the following but it doesnt work:
df.na.fill({"values2":df['values']}).show()
I found this way to solve it but there should be something more clear forward:
def change_null_values(a,b):
if b:
return b
else:
return a
udf_change_null = udf(change_null_values,StringType())
df.withColumn("values2",udf_change_null("values","values2")).show()