What is the equivalent of pandas.DataFrame.tail in DataBricks

Question

What is the equivalent of pandas.DataFrame.tail in DataBricks ? I search a bit in the documentation but didn't found any relevant function.

score 2 · Accepted Answer · answered Jan 14 '19 at 16:01

DataBricks is apparently using pyspark.sql dataframes, not pandas.

# Index the df if you haven't already
# Note that monotonically increasing id has size limits
from pyspark.sql.functions import monotonically_increasing_id
df = df.withColumn("index", monotonically_increasing_id())

# Query with the index
tail = sqlContext.sql("""SELECT * FROM df ORDER BY index DESC limit 5""")
tail.show()

Note that this is expensive and doesn't play to the strengths of Spark.

What is the equivalent of pandas.DataFrame.tail in DataBricks

1 Answers1