i have a code:
list_files = glob.glob("/t/main_folder/*/file_*[0-9].csv")
test = sorted(list_files, key = lambda x:x[-5:])
so this code has helped me to find files that i need to work with. I found 5 csv files in different folders. next step-im using a code down below , to work with every file i found, i need to use full outer join for every file, firstly for main_folder/folder1/file1.csv, then for main_folder/folder2/file2 and etc etc. until last file that was found one-by-one. thats why i need loop
df_deltas = spark.read.format("csv").schema(schema).option("header","true")\
.option("delimiter",";").load(test)
df_mirror = spark.read.format("csv").schema(schema).option("header","true")\
.option("delimiter",",").load("/t/org_file.csv").cache()
df_deltas.createOrReplaceTempView("deltas")
df_mirror.createOrReplaceTempView("mirror")
df_mir2=spark.sql("""select
coalesce (deltas.DATA_ACTUAL_DATE,mirror.DATA_ACTUAL_DATE) as DATA_ACTUAL_DATE,
coalesce (deltas.DATA_ACTUAL_END_DATE,mirror.DATA_ACTUAL_END_DATE) as DATA_ACTUAL_END_DATE,
coalesce (deltas.ACCOUNT_RK,mirror.ACCOUNT_RK) as ACCOUNT_RK,
coalesce (deltas.ACCOUNT_NUMBER,mirror.ACCOUNT_NUMBER) as ACCOUNT_NUMBER,
coalesce (deltas.CHAR_TYPE,mirror.CHAR_TYPE) as CHAR_TYPE,
coalesce (deltas.CURRENCY_RK,mirror.CURRENCY_RK) as CURRENCY_RK,
coalesce (deltas.CURRENCY_CODE,mirror.CURRENCY_CODE) as CURRENCY_CODE,
coalesce (deltas.CLIENT_ID,mirror.CLIENT_ID) as CLIENT_ID,
coalesce (deltas.BRANCH_ID,mirror.BRANCH_ID) as BRANCH_ID,
coalesce (deltas.OPEN_IN_INTERNET,mirror.OPEN_IN_INTERNET) as OPEN_IN_INTERNET
from mirror
full outer join deltas on
deltas.ACCOUNT_RK=mirror.ACCOUNT_RK
""")
df_deltas = spark.read.format("csv").schema(schema).option("header","true")\
.option("delimiter",";").load(test)--HERE I'M USING MY CODE TO FILL THE .LOAD WITH FILES
how is it possible to make a loop for the first found file, then for the second and so on?