Can we add two columns when we are selecting them directly from a dataframe

Question

I created a dataframe

'df1=spark.read.format("csv").option("delimiter","|").load(file)'

Now I want to select columns from that dataframe

df1.select("col1","col2","col3")

This works fine. But can I concat two columns in the same statement. consider col1 as age,col2 as firstname and col3 as lastname. I am looking for output as mentioned below combined first and last name. I know it can be done using sparksql. I want to know this can be done in the above df1.select() statement.Thanks

col1  col2col3
23    JohnHarper
20    MarshallMathers

score 0 · Answer 1 · answered Aug 28 '18 at 17:07

0

You can try something like this

df1.select("col1",concat("col2","col3"))

answered Aug 28 '18 at 17:07

madhu

1,140
8
14

Got an error concat is not defined. Do I need to import anything? – sam Aug 28 '18 at 17:09
Can you post the snapshot of the error here – madhu Aug 28 '18 at 17:18
in pyspark you would need `from pyspark.sql.functions import *` or for specific `from pyspark.sql.functions import concat` – Ramesh Maharjan Aug 28 '18 at 17:21

Ramesh Maharjan · Accepted Answer · 2018-08-28T17:20:42.157

0

You can use concat function as

from pyspark.sql.functions import *
df1.select("col1",concat("col2","col3").alias("col2col3")).show(truncate=False)

or use concat_ws as

df1.select("col1",concat_ws("", "col2","col3").alias("col2col3")).show(truncate=False)

or you can use a udf function as

from pyspark.sql.functions import *
from pyspark.sql.types import *

@udf(StringType())
def concatenating(x, y):
    return "".join([x,y])

df1.select("col1", concatenating(col("col2"),col("col3")).alias("col2col3")).show(truncate=False)

edited Aug 28 '18 at 17:20

answered Aug 28 '18 at 17:14

Ramesh Maharjan

41,071
6
69
97

Thanks @Ramesh Maharjan. I totally forgot to import the function. Its a silly mistake.Thanks – sam Aug 30 '18 at 05:24
happy to hear that its helpful :) you can upvote both mine and madhu's answer too ;) its the up button – Ramesh Maharjan Aug 30 '18 at 05:25

Can we add two columns when we are selecting them directly from a dataframe

2 Answers2