0

I created a dataframe

'df1=spark.read.format("csv").option("delimiter","|").load(file)'

Now I want to select columns from that dataframe

df1.select("col1","col2","col3")

This works fine. But can I concat two columns in the same statement. consider col1 as age,col2 as firstname and col3 as lastname. I am looking for output as mentioned below combined first and last name. I know it can be done using sparksql. I want to know this can be done in the above df1.select() statement.Thanks

col1  col2col3
23    JohnHarper
20    MarshallMathers
Ramesh Maharjan
  • 41,071
  • 6
  • 69
  • 97
sam
  • 35
  • 5

2 Answers2

0

You can try something like this

df1.select("col1",concat("col2","col3"))
madhu
  • 1,140
  • 8
  • 14
0

You can use concat function as

from pyspark.sql.functions import *
df1.select("col1",concat("col2","col3").alias("col2col3")).show(truncate=False)

or use concat_ws as

df1.select("col1",concat_ws("", "col2","col3").alias("col2col3")).show(truncate=False)

or you can use a udf function as

from pyspark.sql.functions import *
from pyspark.sql.types import *

@udf(StringType())
def concatenating(x, y):
    return "".join([x,y])

df1.select("col1", concatenating(col("col2"),col("col3")).alias("col2col3")).show(truncate=False)
Ramesh Maharjan
  • 41,071
  • 6
  • 69
  • 97