duplicate a column in pyspark data frame

Question

I have a data frame in pyspark like sample below. I would like to duplicate a column in the data frame and rename to another column name.

Name    Age    Rate
Aira     23     90
Ben      32     98
Cat      27     95

Desired output is :

Name    Age     Rate     Rate2
Aira    23      90       90
Ben     32      98       98
Cat     27      95       95

How can I do it?

You're looking for the `withColumn()` function: `df = df.withColumn("Rate2", "Rate")` should work. Let me try to find a dupe link... — pault, May 17 '18 at 19:46
See [How do I add a new column to a Spark DataFrame (using PySpark)?](https://stackoverflow.com/questions/33681487/how-do-i-add-a-new-column-to-a-spark-dataframe-using-pyspark) and [Adding a new column in Data Frame derived from other columns (Spark)](https://stackoverflow.com/questions/31333437/adding-a-new-column-in-data-frame-derived-from-other-columns-spark) — pault, May 17 '18 at 19:49

score 40 · Accepted Answer · answered May 17 '18 at 19:46

40

Just

df.withColumn("Rate2", df["Rate"])

or (in SQL)

SELECT *, Rate AS Rate2 FROM df

answered May 17 '18 at 19:46

1 Answers1