1

I am using pyspark in databricks with a JSON file to clean data. The expression in the eval brackets comes from the JSON file.

One of the issues I am facing is manipulating timestamps/string.

I am trying to find the difference in months between a timestamp column and a single date (which is a string)

See code below.

import pyspark.sql.functions as F
df2 = df2.withColumn('test', eval("months_between( F.to_date(F.col('period_name')), lit('31/03/2019'))"))

It doesn't throw an error but evaluates to null.

Tiger_Stripes
  • 485
  • 5
  • 17
  • [Using `eval` is generally bad practice](https://stackoverflow.com/questions/1832940/why-is-using-eval-a-bad-practice). I don't see any need for it in this context. – pault Nov 07 '19 at 17:07

2 Answers2

2

Using eval is bad practice. I see no need for it in this case.

You can maintain the flexibility of reading the query from a json file using pyspark.sql.functions.expr instead

df2 = df2.withColumn(
    'test', 
    expr("months_between(to_date(period_name), to_date('31/03/2019', 'dd/MM/yyyy'))")
)
pault
  • 41,343
  • 15
  • 107
  • 149
0

Just change your eval query to:

df2 = df2.withColumn('test', eval("months_between(F.to_date(F.col('period_name')), F.to_date(F.lit('31/03/2019'), 'dd/MM/yyyy'))"))

This should convert the literal string column to datetype too.

If your date string is not yyyy-MM-dd, then you will need to specify the format.

pissall
  • 7,109
  • 2
  • 25
  • 45
  • Thanks this does work but weirdly I had to change the order of my date to yyyy-mm-dd otherwise it would still return null – Tiger_Stripes Nov 07 '19 at 15:17
  • @Tiger_Stripes I have updated my answer to include date format – pissall Nov 07 '19 at 15:33
  • Why use `eval` at all? Why not just: `df2.withColumn('test',F.months_between(F.to_date(F.col('period_name')), F.to_date(F.lit('31/03/2019'), 'dd/MM/yyyy'))))` – pault Nov 07 '19 at 17:06
  • @pault He says that the query/code is coming as a string from a json object. – pissall Nov 07 '19 at 17:08
  • but your answer modified the expression inside the `eval` which means that OP can do the same, which means they can avoid using `eval` altogether. – pault Nov 07 '19 at 17:10
  • @pault It’s possible it could be coming from another process. It’s mentioned in the question, so it’s same to assume it could be a constraint – pissall Nov 07 '19 at 18:06