I have the following data frame (called df
) with columns item_name
and item_level
:
item_name item_level
----------------------------
Item1 1
Item2 2
Item3 2
Item4 3
I would like to create a new column which produces indentdation of the items, depending on their level. To do that, I would like to multiply the item_level by the string '---', with the idea that when I do that the string gets concatenated with itself as many times as the value of the integer I am multiplying the string with.
My desired result is something like this:
item_name item_level new_column
------------------------------------------------
Item1 1 ---Item1
Item2 2 ------Item2
Item3 2 ------Item3
Item4 3 ---------Item4
In pyspark when I write the following command, the created column contains only null
values:
from pyspark.sql import functions as F
df = df.withColumn('new_column',F.concat(F.lit(df.item_level*'---'),df.item_name))
The null
values seem to come from the multiplication of the integers with the string. The concat
function seems to work properly. For instance, the following works:
df = df.withColumn('new_column',F.concat(df.item_name,df.item_name))
I also tried few other things. If I use a constant number to multiply the string, the resulting string is displayed as wished:
number = 3
df = df.withColumn('new_column', F.lit(number*'---'))
Furthermore, adding the '---' string first in a column (with identical rows '---'), and then multiplying that column with the item_level column gives null values as well:
df = df.withColumn('padding',F.lit('---'))
df = df.withColumn('test',df.padding*df.item_name)
If I use pandas, however, this last piece of code does what I want. But I need to do this in pyspark.