# Creating an arbitrary DataFrame
df = spark.createDataFrame([(1,54),(2,7),(3,72),(4,99)], ['ID','Var'])
df.show()
+---+---+
| ID|Var|
+---+---+
| 1| 54|
| 2| 7|
| 3| 72|
| 4| 99|
+---+---+
Once the DataFrame
has been created, we use floor()
function to find the integral part of a number. For eg; floor(15.5)
will be 15
. We need to find the integral part of the Var/10
and add 1 to it, because the indexing starts from 1, as opposed to 0. Finally, we have need to prepend group
to the value. Concatenation can be achieved with concat()
function, but keep in mind that since the prepended word group
is not a column, so we need to put it inside lit()
which creates a column of a literal value.
# Requisite packages needed
from pyspark.sql.functions import col, floor, lit, concat
df = df.withColumn('Var',concat(lit('group'),(1+floor(col('Var')/10))))
df.show()
+---+-------+
| ID| Var|
+---+-------+
| 1| group6|
| 2| group1|
| 3| group8|
| 4|group10|
+---+-------+