I'm not versed in blob storage, but let's say you're able to read the text file from that storage (see this for help), you can create a dict of the schema strings for each table.
FYI - schemas can also be passed as 'column1 string, column2 string, column3 string'
. And, I'll share an example for the same.
For this example, I've stored a .txt
file on an accessible drive that has two rows with the following strings.
"table1=['column1','column2','column3']"
"table2=['column4','column5','column6']"
The idea is to split the string by "="
and use the first part as a dict key (for table name) and create a schema string using the second part (that can be evaluated as a list).
schema_dict = {}
with open('./drive/MyDrive/blahblah.txt', 'r') as col_txt:
for linerow in col_txt:
stuff = linerow.split('=')
schema_dict[stuff[0].strip()] = ', '.join([k+' string' for k in eval(stuff[1])])
schema_dict
# {'table1': 'column1 string, column2 string, column3 string',
# 'table2': 'column4 string, column5 string, column6 string'}
schema_dict['table2']
# column4 string, column5 string, column6 string
Alternatively, StructType([StructField()])
creation would also be easy.
schema_dict = {}
with open('./drive/MyDrive/blahblah.txt', 'r') as col_txt:
for linerow in col_txt:
stuff = linerow.split('=')
schema_dict[stuff[0].strip()] = StructType([StructField(k, StringType(), True) for k in eval(stuff[1])])
schema_dict
# {'table1': StructType(List(StructField(column1,StringType,true),StructField(column2,StringType,true),StructField(column3,StringType,true))),
# 'table2': StructType(List(StructField(column4,StringType,true),StructField(column5,StringType,true),StructField(column6,StringType,true)))}
schema_dict['table1']
# StructType(List(StructField(column1,StringType,true),StructField(column2,StringType,true),StructField(column3,StringType,true)))
type(schema_dict['table1'])
# pyspark.sql.types.StructType