How to rename multiple column names as single column?

Question

I have a table which has columns [col1, col2, col3 .... col9]. I want to merge all the columns data into one column as col in python?

Possible duplicate of [Concat multiple columns of a dataframe using pyspark](https://stackoverflow.com/questions/54921359/concat-multiple-columns-of-a-dataframe-using-pyspark) — cph_sto, Oct 09 '19 at 07:32

score 2 · Answer 1 · answered Oct 09 '19 at 07:00

from pyspark.sql.functions import concat

values = [('A','B','C','D'),('E','F','G','H'),('I','J','K','L')]
df = sqlContext.createDataFrame(values,['col1','col2','col3','col4'])
df.show()

+----+----+----+----+
|col1|col2|col3|col4|
+----+----+----+----+
|   A|   B|   C|   D|
|   E|   F|   G|   H|
|   I|   J|   K|   L|
+----+----+----+----+

req_column = ['col1','col2','col3','col4']
df = df.withColumn('concatenated_cols',concat(*req_column))
df.show()

+----+----+----+----+-----------------+
|col1|col2|col3|col4|concatenated_cols|
+----+----+----+----+-----------------+
|   A|   B|   C|   D|             ABCD|
|   E|   F|   G|   H|             EFGH|
|   I|   J|   K|   L|             IJKL|
+----+----+----+----+-----------------+

score 1 · Answer 2 · answered Oct 09 '19 at 08:39

1

using Spark SQL

new_df=sqlContext.sql("SELECT CONCAT(col1,col2,col3,col3) FROM df")

Using Non Spark SQL way you can use Concat function

new_df = df.withColumn('joined_column', concat(col('col1'),col('col2'),col('col3'),col('col4'))

answered Oct 09 '19 at 08:39

Strick

1,512
9
15

score 0 · Answer 3 · answered Oct 09 '19 at 06:58

In Spark(pySpark) for reasons, there is no edit of existing data. What you can do is create a new column. Please check the following link.

How do I add a new column to a Spark DataFrame (using PySpark)?

Using a UDF function, you can aggregate/combine all those values in a row and return you as a single value.

Few cautions, please look out for following data issues while aggregation

Null values
Type mismatches
String Encoding issues

How to rename multiple column names as single column?

3 Answers3