column name as data for other column in the same dataframe

Asked Jun 26 '19 at 22:19

Active Jun 27 '19 at 10:46

Viewed 34 times

There is a data-frame which consists of 3 columns.

+-----+----+-------+    
| name| id |Subject|    
+-----+---+--------+    
|  one|  1 |Science|    
|  two|  2 |  Maths|    
|three|  3 |Science|   
| four|  4 | random|    
+-----+---+--------+

My requirement is to replace the data of first column with the column name of third column so the result table will be like:

+-------+---+-------+
|   name| id|Subject|
+-------+---+-------+
|Subject|  1|Science|
|Subject|  2|  Maths|    
|Subject|  3|Science|    
|Subject|  4| random|    
+-------+---+-------+

List item

Can someone help me how I can achieve this in pyspark.

edited Jun 27 '19 at 10:46

Prathik Kini

1,067
11
25

asked Jun 26 '19 at 22:19

sri

1

Import `lit` from `pyspark.sql.functions` and then do `df = df.withColumn("name", lit(df.columns[2]))` – pault Jun 27 '19 at 02:11
Possible duplicate of [How to add a constant column in a Spark DataFrame?](https://stackoverflow.com/questions/32788322/how-to-add-a-constant-column-in-a-spark-dataframe) – pault Jun 27 '19 at 02:11

column name as data for other column in the same dataframe

0 Answers0