I have a table which has columns [col1, col2, col3 .... col9]. I want to merge all the columns data into one column as col in python?
Asked
Active
Viewed 981 times
-1
-
1Are you using Pandas or Pyspark? – sam Oct 09 '19 at 06:27
-
I am using Pyspark – Jaswanth Garudachedu Oct 09 '19 at 06:43
-
2Possible duplicate of [Concat multiple columns of a dataframe using pyspark](https://stackoverflow.com/questions/54921359/concat-multiple-columns-of-a-dataframe-using-pyspark) – cph_sto Oct 09 '19 at 07:32
3 Answers
2
from pyspark.sql.functions import concat
values = [('A','B','C','D'),('E','F','G','H'),('I','J','K','L')]
df = sqlContext.createDataFrame(values,['col1','col2','col3','col4'])
df.show()
+----+----+----+----+
|col1|col2|col3|col4|
+----+----+----+----+
| A| B| C| D|
| E| F| G| H|
| I| J| K| L|
+----+----+----+----+
req_column = ['col1','col2','col3','col4']
df = df.withColumn('concatenated_cols',concat(*req_column))
df.show()
+----+----+----+----+-----------------+
|col1|col2|col3|col4|concatenated_cols|
+----+----+----+----+-----------------+
| A| B| C| D| ABCD|
| E| F| G| H| EFGH|
| I| J| K| L| IJKL|
+----+----+----+----+-----------------+

Prathik Kini
- 1,067
- 11
- 25
1
using Spark SQL
new_df=sqlContext.sql("SELECT CONCAT(col1,col2,col3,col3) FROM df")
Using Non Spark SQL way you can use Concat function
new_df = df.withColumn('joined_column', concat(col('col1'),col('col2'),col('col3'),col('col4'))

Strick
- 1,512
- 9
- 15
0
In Spark(pySpark) for reasons, there is no edit of existing data. What you can do is create a new column. Please check the following link.
How do I add a new column to a Spark DataFrame (using PySpark)?
Using a UDF function, you can aggregate/combine all those values in a row and return you as a single value.
Few cautions, please look out for following data issues while aggregation
- Null values
- Type mismatches
- String Encoding issues

sam
- 1,819
- 1
- 18
- 30