Actually there are many way to achieve this, one is to play with some basic python dataframe, list and loop(Sorry for my poor English)
Let's play....
1st you have to retrieve all table name and with those table name retrieve table description that contain all column name with data type.
we use 2 spark sql query
1: Table_name = spark.sql("SHOW TABLES FROM default")
(default
databricks default database name)
result
+--------+--------------+-----------+ |database| tableName|isTemporary| +--------+--------------+-----------+ | default|convertedtable| false| | default| delta_table| false| | default| display| false| | default| events| false| | default| export_csv| false| | default| flights| false| | default| play_store| false| +--------+--------------+-----------+
2: table_describe = spark.sql("DESCRIBE default.play_store")
result
+--------------+---------+-------+ | col_name|data_type|comment| +--------------+---------+-------+ | App| string| null| | Category| string| null| | Rating| string| null| | Reviews| string| null| | Size| string| null| | Installs| string| null| | Type| string| null| | Price| string| null| |Content Rating| string| null| | Genres| string| null| | Last Updated| string| null| | Current Ver| string| null| | Android Ver| string| null| +--------------+---------+-------+
Now the main part
Table_name = spark.sql("SHOW TABLES FROM default")
Table_name=Table_name.select('tableName').collect()
mvv_array = [(row.tableName) for row in Table_name]
cnt = 0
for idx,i in enumerate(mvv_array):
if cnt == 0:
tmp = spark.sql('DESCRIBE default.'+i).withColumn('table_name',lit(i))
temp_df = tmp
cnt = cnt + 1
else:
tmp = spark.sql('DESCRIBE default.'+i).withColumn('table_name',lit(i))
temp_df=temp_df.unionAll(tmp)
temp_df.show()
TADAAAA.....
+-----------------+---------+-------+--------------+ | col_name|data_type|comment| table_name| +-----------------+---------+-------+--------------+ | Year| int| null|convertedtable| | Month| int| null|convertedtable| | DayofMonth| int| null|convertedtable| | DayOfWeek| int| null|convertedtable| | DepTime| string| null|convertedtable| | CRSDepTime| int| null|convertedtable| | ArrTime| string| null|convertedtable| | CRSArrTime| int| null|convertedtable| | UniqueCarrier| string| null|convertedtable| | FlightNum| int| null|convertedtable| | TailNum| string| null|convertedtable| |ActualElapsedTime| string| null|convertedtable| | CRSElapsedTime| string| null|convertedtable| | AirTime| string| null|convertedtable| | ArrDelay| string| null|convertedtable| | DepDelay| string| null|convertedtable| | Origin| string| null|convertedtable| | Dest| string| null|convertedtable| | Distance| int| null|convertedtable| | TaxiIn| string| null|convertedtable| +-----------------+---------+-------+--------------+ only showing top 20 rows
Take what ever you need from new dataframe.
Hope its fulfill what you want.
If you find it as your solution don't forget to mark as Answer and up vote.