21

I have a data frame in pyspark like sample below. I would like to duplicate a column in the data frame and rename to another column name.

Name    Age    Rate
Aira     23     90
Ben      32     98
Cat      27     95

Desired output is :

Name    Age     Rate     Rate2
Aira    23      90       90
Ben     32      98       98
Cat     27      95       95

How can I do it?

User12345
  • 5,180
  • 14
  • 58
  • 105
  • 1
    You're looking for the `withColumn()` function: `df = df.withColumn("Rate2", "Rate")` should work. Let me try to find a dupe link... – pault May 17 '18 at 19:46
  • See [How do I add a new column to a Spark DataFrame (using PySpark)?](https://stackoverflow.com/questions/33681487/how-do-i-add-a-new-column-to-a-spark-dataframe-using-pyspark) and [Adding a new column in Data Frame derived from other columns (Spark)](https://stackoverflow.com/questions/31333437/adding-a-new-column-in-data-frame-derived-from-other-columns-spark) – pault May 17 '18 at 19:49

1 Answers1

40

Just

df.withColumn("Rate2", df["Rate"])

or (in SQL)

SELECT *, Rate AS Rate2 FROM df