convert loc expression from pandas to Pyspark?

Question

How can I convert this expression from pandas to Pyspark Dataframe?

The target is to assign the column date_stamp the value cur

#the data frame is:
tag,     2020-06-25
-------------------
3FMTK1RM    0
678jhgt     18

#######################
vin='3FMTK1RM'# is the first element of tag
cur= 5
date_stamp='2020-06-25'
df.loc[str(date_stamp),vin] = cur

Spark dataframe is an unordered collection of rows. You cannot access/modify elements by index. — mck, Feb 05 '21 at 10:36
Is there a way of creating a new Dataframe then do the union ? — insses06 06, Feb 05 '21 at 10:39
The code is long a little complex , I tried to simplify it as possible — insses06 06, Feb 05 '21 at 11:42
Hello @insses0606. Your question is not clear and not all spark users know how to use pandas. Please make it more understandable by adding a reproducible example, your input spark data frame and the desired output. You see this post on how make good pyspark examples: https://stackoverflow.com/questions/48427185/how-to-make-good-reproducible-apache-spark-examples — blackbishop, Feb 05 '21 at 11:48
you can use Koalas: https://koalas.readthedocs.io/en/latest/index.html — Alex Ott, Feb 05 '21 at 12:30

score 1 · Accepted Answer · answered Feb 05 '21 at 12:29

1

You can use when:

import pyspark.sql.functions as F

df2 = df.withColumn(
    '2020-06-25', 
    F.when(F.col('tag') == 'vin', cur).otherwise(F.col('2020-06-25'))
)

answered Feb 05 '21 at 12:29

mck

40,932
13
35
50

convert loc expression from pandas to Pyspark?

1 Answers1