I am new to pyspark and working on my first spark project where I am facing two issues.
a) not able to reference column using
df["col1"].show()
***TypeError: 'Column' object is not callable***
b) not able to replace values in my spark dataframe with aggregated value like mean
Code:
from pyspark import SparkConf, SparkContext
from pyspark.sql.functions import *
from pyspark.sql import Row, HiveContext, SQLContext, Column
from pyspark.sql.types import *
df = hive_context.table("db_new.temp_table")
df.select("col1").fillna(df.select("col1").mean())
***AttributeError: 'DataFrame' object has no attribute 'mean'***
Any help is greatly appreciated!
Update:
I tried the below code snippet but it is returning another error.
df.withColumn("new_Col", when("ColA".isNull,df.select(mean("ColA"))
.first()(0).asInstanceOf[Double])
.otherwise("ColA"))
AttributeError: 'str' object has no attribute 'isNull'