What is the equivalent of pandas.DataFrame.tail in DataBricks ? I search a bit in the documentation but didn't found any relevant function.
Asked
Active
Viewed 3,814 times
1 Answers
2
DataBricks is apparently using pyspark.sql
dataframes, not pandas
.
# Index the df if you haven't already
# Note that monotonically increasing id has size limits
from pyspark.sql.functions import monotonically_increasing_id
df = df.withColumn("index", monotonically_increasing_id())
# Query with the index
tail = sqlContext.sql("""SELECT * FROM df ORDER BY index DESC limit 5""")
tail.show()
Note that this is expensive and doesn't play to the strengths of Spark
.
See also:
https://medium.com/@chris_bour/6-differences-between-pandas-and-spark-dataframes-1380cec394d2
pyspark,spark: how to select last row and also how to access pyspark dataframe by index

Charles Landau
- 4,187
- 1
- 8
- 24