1

I'm trying to join two dataframes but the values of the second keep turning into nulls:

joint = sdf.join(k, "date", how='left').select(sdf.date, sdf.Res, sdf.Ind, k.gen.cast(IntegerType())).orderBy('date')

output: | 1/1/2001 | 4103 | 9223 | null |

dwjohnston
  • 11,163
  • 32
  • 99
  • 194
Paul
  • 11
  • 1
  • may be there is no matching date value in k and due to that, it is returning null. As you are using left join, you will be getting null from the right table, if there is no matching value in the right table. – Venkataraman R Jan 16 '19 at 04:22
  • That's what I thought at first, but k.show() shows it's full of data. – Paul Jan 16 '19 at 11:47
  • that is fine. but, does k have dates of sdf, as it is being used as key. E.g, does k have dates for 1/1/2001 ? – Venkataraman R Jan 16 '19 at 12:00
  • Yes, both have the same date range, the only difference is k originally had a 00:00:00 time stamp so I reformatted to match yyyy/mm/dd of sdf. – Paul Jan 16 '19 at 14:15
  • Please try to provide a [mcve] with a small [reproducible example](https://stackoverflow.com/questions/48427185/how-to-make-good-reproducible-apache-spark-examples). My guess is that `k.gen` is failing the conversion to integer, but there's no way to tell without seeing the data. – pault Jan 16 '19 at 15:05

0 Answers0