I have a csv file with name and value as my column. Both are in string format.
dummy.csv:
Jordan 20|
Mike NA|
James 30|
Steve NA|
Stella 20|
David NA
Schema:
root
name: string (nullable = true)
value: string (nullable = true)
I'm trying to replace "NA" values with average value of that particular column. I'm able to calculate the average,however I have an issue replacing "NA" values with mean
dummmyCol=['value']
dummydf.select([round(mean(col(c)),2).alias(c) for c in dummmyCol]).show()
+-----+
|value|
+-----+
|23.33|
+-----+
The below code is what I attempted to replace NA values. I know the below code is flawed. Any help would be greatly appreciated. Thanks
dummydf.select([when(col(c1)=='NA',dummydf.select(round(mean(col(c1)),2))).alias(c1) for c1 in dummmyCol])
Expected output should be:
Jordan 20|
Mike 23.3|
James 30|
Steve 23.3|
Stella 20|
David 23.3