0

I have a PySpark dataframe that has a couple of fields, e.g.:

Id Name Surname
1 John Johnson
2 Anna Maria

I want to create a new column that would mix the values of other comments into a new string. Desired output is:

Id Name Surname New
1 John Johnson Hey there John Johnson!
2 Anna Maria Hey there Anna Maria!

I'm trying to do (pseudocode):

df = df.withColumn("New", "Hey there " + Name + " " + Surname + "!")

How can this be achieved?

blackbishop
  • 30,945
  • 11
  • 55
  • 76
Alcibiades
  • 335
  • 5
  • 16
  • wrap the literal values in `lit()` and the column names in `col()`. concatenation can be done using `concat()`. see [func doc](https://spark.apache.org/docs/3.3.0/api/python/reference/pyspark.sql/functions.html) for more details. – samkart Aug 03 '22 at 16:51

1 Answers1

3

You can use concat function or format_string like this:

from pyspark.sql import functions as F

df = df.withColumn(
    "New", 
    F.format_string("Hey there %s %s!", "Name", "Surname")
)

df.show(truncate=False)
# +---+----+-------+-----------------------+
# |Id |Name|Surname|New                    |
# +---+----+-------+-----------------------+
# |1  |John|Johnson|Hey there John Johnson!|
# |2  |Anna|Maria  |Hey there Anna Maria!  |
# +---+----+-------+-----------------------+

If you prefer using concat:

F.concat(F.lit("Hey there "), F.col("Name"), F.lit(" "), F.col("Surname"), F.lit("!"))
blackbishop
  • 30,945
  • 11
  • 55
  • 76