26

Having some databases and tables in them in Hive instance. I'd like to show tables for some specific database (let's say 3_db).

+------------------+--+
|  database_name   |
+------------------+--+
| 1_db             |
| 2_db             |
| 3_db             |
+------------------+--+

If I enter beeline from bash-nothing complex there, I just do the following:

show databases;
show tables from 3_db;

When I'm using pyspark via ipython notebeook- my cheap tricks are not working there and give me error on the second line (show tables from 3_db) instead:

sqlContext.sql('show databases').show()
sqlContext.sql('show tables from 3_db').show()

What seems to be wrong and why's the same code works in one place and don't work in another?

Keithx
  • 2,994
  • 15
  • 42
  • 71

3 Answers3

34
sqlContext.sql("show tables in 3_db").show()
David דודו Markovitz
  • 42,900
  • 6
  • 64
  • 88
17

Another possibility is to use the Catalog methods:

spark = SparkSession.builder.getOrCreate()
spark.catalog.listTables("3_db")

Just be aware that in PySpark this method returns a list and in Scala, it returns a DataFrame.

aelesbao
  • 503
  • 4
  • 14
7

There are two possible ways to achieve this, but they differ a lot in terms of efficiency.


Using SQL

This is the most efficient approach:

spark_session = SparkSession.builder.getOrCreate()
spark_session.sql("show tables in db_name").show()

Using catalog.listTables()

The following is more inefficient compared to the previous approach, as it also loads tables' metadata:

spark_session = SparkSession.builder.getOrCreate()
spark_session.catalog.listTables("db_name")
Giorgos Myrianthous
  • 36,235
  • 20
  • 134
  • 156