0

How can I modify the below code to only fetch the last row in the table, specifically the value under the key column? The reason is, it is a huge table and I need the last row, specifically the key value, to know how much it loaded thus far. I do not care about what other contents there are.

Line 1:

val df = spark.sqlContext.read.format("datasource").option("project", "character").option("apiKey", "xx").option("type", "tables").option("batchSize", "10000").option("database", "humans").option("table", "healthGamma").option("inferSchema", "true").option("inferSchemaLimit", "1").load()

Line 2:

df.createTempView("tables")

Line 3:

spark.sqlContext.sql("select * from tables").repartition(1).write.option("header","true").parquet("lifes_remaining")
PolarBear10
  • 2,065
  • 7
  • 24
  • 55

1 Answers1

1

you can use orderBy in a Dataframe like this, hope it helps:

df.orderBy($"value".desc).show(1) 
Yash Shah
  • 125
  • 13
  • adding it after line 2 wouldn't affect result of line 2. You could just do: `df.orderBy($"value".desc).limit(1).write.option("header","true").parquet("lifes_remaining")` – Lior Chaga Dec 05 '18 at 09:50
  • @LiorChaga why do we order by `value` instead of `key` column. There are other values columns that also have `int` values – PolarBear10 Dec 05 '18 at 09:52
  • 1
    just copy pasted @YashShah reply. Yes, it should by by key column – Lior Chaga Dec 05 '18 at 09:53
  • I just gave an example writing “value” you should use “key” – Yash Shah Dec 05 '18 at 09:56
  • @YashShah the list is already ordered in the database, I just need the value of the last row in the `key` column. I do not want to iterate over the table, I want it to just give the last row value , is that possible ? – PolarBear10 Dec 05 '18 at 10:03
  • @LiorChaga it seems like my question was not clear, can you kindly read the above – PolarBear10 Dec 05 '18 at 10:03
  • Oh, in that case, use `.option("table", "(select max(key) from healthGamma) as lastHealthGama")` when you read from the DB. This will return a dataframe with single column and single row, holding the last key. – Lior Chaga Dec 05 '18 at 10:09
  • @Matthew here we are just fetching the last row by descending it and then ordering it, if you still are unsure what's happening you can look up this answer https://stackoverflow.com/a/32052881/6355940 – Yash Shah Dec 05 '18 at 10:09