In Scala I can simply duplicate a column in a DF like this:
val df =
spark.read.format("csv")
.option("sep", ",")
.option("inferSchema", "true")
.option("header", "true")
.option("samplingRatio", "1.0")
.load("/FileStore/tables/diabetesPIMA.dat")
df.show(false)
val df2 = df.withColumn("age2", $"age")
df2.show()
How to do this simple copy in pyspark using withColumn?
Nothing seems to work and all posts do not work either. Odd, must be missing something, but as stated all posts do not work on Databricks.
Error message:
org.apache.spark.sql.AnalysisException: cannot resolve '`age`' given input columns: [ glucose, pregnancies, insulin, outcome, BMI, age, diabetesPF, skinThickness, bloodPressure];;
for in pyspark (as per the answer which I already tried):
df = df.withColumn('age2', F.col('age'))
df.show()
which looks very similar to:
df = df.withColumn('col3', F.col('col2'))