0

Given a DataFrame df, when I do

df.select(df['category_id']+1000), I get results

>>> df.select(df['category_id']).limit(3).show()
+-----------+
|category_id|
+-----------+
|          1|
|          2|
|          3|
+-----------+

>>> df.select(df['category_id']+1000).limit(3).show()
+--------------------+
|(category_id + 1000)|
+--------------------+
|                1001|
|                1002|
|                1003|
+--------------------+

However when I do df.select(df['category_name']+ ' blah'), get null

>>> df.select(df['category_name']).limit(3).show()
+-------------------+
|      category_name|
+-------------------+
|           Football|
|             Soccer|
|Baseball & Softball|
+-------------------+

>>> df.select(df['category_name']+'blah').limit(3).show()
+----------------------+
|(category_name + blah)|
+----------------------+
|                  null|
|                  null|
|                  null|
+----------------------+

Just wondering what makes one work and the other is not? What am I missing?

Bala
  • 11,068
  • 19
  • 67
  • 120
  • 1
    Possible duplicate of [Concatenate columns in Apache Spark DataFrame](https://stackoverflow.com/questions/31450846/concatenate-columns-in-apache-spark-dataframe) – 10465355 Dec 02 '18 at 23:00

1 Answers1

1

Unlike python, the + operator is not defined as string concatenation in spark (and sql doesn't do this too), instead it has concat/concat_ws for string concatenation.

import pyspark.sql.functions as f

df.select(f.concat(df.category_name, f.lit('blah')).alias('category_name')).show(truncate=False)
#+-----------------------+
#|category_name          |
#+-----------------------+
#|Footballblah           |
#|Soccerblah             |
#|Baseball & Softballblah|
#+-----------------------+

df.select(f.concat_ws(' ', df.category_name, f.lit('blah')).alias('category_name')).show(truncate=False)
#+------------------------+
#|category_name           |
#+------------------------+
#|Football blah           |
#|Soccer blah             |
#|Baseball & Softball blah|
#+------------------------+
Psidom
  • 209,562
  • 33
  • 339
  • 356