I have a dataframe in pyspark like this :
+--------------------+--------+----------+--------------------+--------------------+
| title| journal| date| author| content|
+--------------------+--------+----------+--------------------+--------------------+
|Kudlow Breaks Wit...|NYT |2019-05-01| By Mark Landler |WASHINGTON — Pres...|
|Scrutiny of Russi...|NYT |2019-05-01|By Charlie Savage...|WASHINGTON — The ...|
|Greek Anarchists ...|NYP |2019-05-01|By Niki Kitsantonis |ATHENS — Greek an...|
I'm looking for replace row where journal equal to "NYP". I know how to proceed with sql context :
df.createOrReplaceTempView("tbl_journal")
df = sqlContext.sql("SELECT journal, date FROM tbl_journal where journal like '%NYT%'")
df = df.withColumn('journal', lit('The New York Times'))
But the problem is that it will rewrite on the original dataframe (I just want to replace the values where journal = 'NYT' and keep the other values).
Other thing, I search on other topics but i don't find solution in order to combine a Where and WithColumn statement. I mean if i do that in PySpark (not with SQL):
df.where(col('journal').like("%NYT%")).withColumn('journal', lit('Oui Test')).show()
It replace all the values, there is no condition.
Do you know how to replace only the values with this condition, in the original dataframe ? With spark or sqlcontext. Thanks for advance !