0

How can I convert this expression from pandas to Pyspark Dataframe?

The target is to assign the column date_stamp the value cur

#the data frame is:
tag,     2020-06-25
-------------------
3FMTK1RM    0
678jhgt     18

#######################
vin='3FMTK1RM'# is the first element of tag
cur= 5
date_stamp='2020-06-25'
df.loc[str(date_stamp),vin] = cur
mck
  • 40,932
  • 13
  • 35
  • 50
insses06 06
  • 123
  • 6
  • Spark dataframe is an unordered collection of rows. You cannot access/modify elements by index. – mck Feb 05 '21 at 10:36
  • Is there a way of creating a new Dataframe then do the union ? – insses06 06 Feb 05 '21 at 10:39
  • union of what..? – mck Feb 05 '21 at 10:40
  • Is there a way that I can share the code with you ? – insses06 06 Feb 05 '21 at 10:43
  • 1
    You can post it in your question using the edit function – mck Feb 05 '21 at 10:43
  • The code is long a little complex , I tried to simplify it as possible – insses06 06 Feb 05 '21 at 11:42
  • 1
    Hello @insses0606. Your question is not clear and not all spark users know how to use pandas. Please make it more understandable by adding a reproducible example, your input spark data frame and the desired output. You see this post on how make good pyspark examples: https://stackoverflow.com/questions/48427185/how-to-make-good-reproducible-apache-spark-examples – blackbishop Feb 05 '21 at 11:48
  • 1
    you can use Koalas: https://koalas.readthedocs.io/en/latest/index.html – Alex Ott Feb 05 '21 at 12:30

1 Answers1

1

You can use when:

import pyspark.sql.functions as F

df2 = df.withColumn(
    '2020-06-25', 
    F.when(F.col('tag') == 'vin', cur).otherwise(F.col('2020-06-25'))
)
mck
  • 40,932
  • 13
  • 35
  • 50