1

I am trying to group the columns and take minimum. Then use the minimum value to compute difference with the date. However, when I use minimum date column I get following error:

raise AnalysisException(s.split(': ', 1)[1], stackTrace)
 pyspark.sql.utils.AnalysisException: u"grouping expressions sequence is empty, and 'table.`lbrnm`' is not an aggregate function. Wrap '(datediff(CAST(CAST('2019-02-28 01:00:00' AS TIMESTAMP) AS DATE), CAST(CAST(concat(CASE WHEN (CAST((CAST(`min_date` AS DECIMAL(7,0)) / CAST(CAST(1000000 AS DECIMAL(7,0)) AS DECIMAL(7,0))) AS DOUBLE) = CAST('1' AS DOUBLE)) THEN '20' ELSE '' END, CASE WHEN (CAST((CAST(`min_date` AS DECIMAL(7,0)) / CAST(CAST(1000000 AS DECIMAL(7,0)) AS DECIMAL(7,0))) AS DOUBLE) = CAST('0' AS DOUBLE)) THEN '19' ELSE '' END, substring(substring(CAST(min(table.lspf.`lsdte`) AS STRING), 0, 3), -2, 2),

Here is my code:

j = lspf_ret.groupBy(col("lsbrnm"),
                     col("lsdlp"),
                     col("lsdlr"))
            .agg(min(col('lsdte')))



    j.select('lsbrnm','lsdlp','lsdlr',col('min(lsdte)').alias('min_date'))
     .select('lsbrnm',
             'lsdlp',
             'lsdlr',
             'min_date', 
             datediff(lit('2019-02-28 01:00:00').cast(TimestampType()),
             concat(when(col("min_date")/1000000=='1','20').otherwise(''),
             when(col("min_date")/1000000=='0','19').otherwise(''),
             right(left(min(lspf.lsdte).cast(StringType()),3),2),
             lit('-'),
             left(right(min(lspf.lsdte).cast(StringType()),4),2), 
             lit('-'),
             right(min(lspf.lsdte).cast(StringType()),2),
             lit(' 00:00:00')  ).cast(TimestampType())))

Here is the aggregated output:

    |lsbrnm|lsdlp|        lsdlr|min(lsdte)|
    +------+-----+-------------+----------+
    |  2266|  EF4| 171001370957|   1190201|
    |  2266|  EF4| 131201027045|   1171130|
    |  2266|  EF4| 140901072492|   1170301|
    |  2266|  EF4| 160901268734|   1180925|
    |  2266|  EF4| 161101289209|   1170929|
    |  2266|  EA4| 18501424940R|   1190220|

Here is the desired output:

    |lsbrnm|lsdlp|        lsdlr|   min_date|difference
    +------+-----+-------------+----------++----------+
    |  2266|  EF4| 171001370957|   1190201|         27|
    |  2266|  EF4| 131201027045|   1171130|         275|
    |  2266|  EF4| 140901072492|   1170301|         1|
    |  2266|  EF4| 160901268734|   1180925|         209|
    |  2266|  EF4| 161101289209|   1170929|         213|
    |  2266|  EA4| 18501424940R|   1190220|         8|

Example of lspf.lsdte:

      lsdte
      +------+
      1190201
      1171130
      1170301
      1180925
      1170929
      1190220

Where am I going wrong?

howie
  • 2,587
  • 3
  • 27
  • 43
user1584253
  • 975
  • 2
  • 18
  • 55

0 Answers0