-1

I have two timestamp variables, t1 is bigint and t2 is timestamp.

pyspark.sql.utils.AnalysisException: u"cannot resolve '(t2 >= 1536796800000L)' due to data type mismatch: differing types in '(t2 >= 1536796800000L)

How can I compare such timestamps in different formats?

Markus
  • 3,562
  • 12
  • 48
  • 85
  • Possible duplicate of [PySpark: inconsistency in converting timestamp to integer in dataframe](https://stackoverflow.com/questions/46122846/pyspark-inconsistency-in-converting-timestamp-to-integer-in-dataframe). tl;dr: you need to convert one of the columns to the other type. There are functions to do this, but it also depends on your spark version. Please update your question to include a [reproducible example](https://stackoverflow.com/questions/48427185/how-to-make-good-reproducible-apache-spark-dataframe-examples) and your spark version. – pault Sep 13 '18 at 14:20

1 Answers1

1

I personally suggest using the arrow module.

import arrow

var = 1536796800
var = arrow.get(str(var)).datetime
eatmeimadanish
  • 3,809
  • 1
  • 14
  • 20
  • I get `ValueError: year is out of range`. – Markus Sep 13 '18 at 14:09
  • 1
    @Markus, You have a non-standard, albeit common, time representation. That is, you have milliseconds since the unix epoch (as opposed to seconds). Dividing by 1000 would solve that problem. – Dunes Sep 13 '18 at 14:19
  • It solved the problem. The thread pointed by @pault is not exactly what I need. – Markus Sep 13 '18 at 15:45
  • It answers his question. – eatmeimadanish Sep 13 '18 at 17:54
  • @eatmeimadanish I missed the "not" in OP's last comment (read it as: "the thread is exactly what I need"). In any case, your code is fine in python but it won't work for spark dataframes (it definitely won't fix that error message). If OP says it worked, then so be it. – pault Sep 13 '18 at 20:03
  • Why would it not work in pyspark data frames? Since it would seem it is an input issue and not an error in the module – eatmeimadanish Sep 17 '18 at 14:39