I have a path in data bricks '/dbfs/mnt/sgi/report' inside this path there are parquet files and folder are also there inside this path. Inside every folder there are parquet files. so I have to create an excel sheet and first col will be the path of the parquet file and 2nd col will be the the no. of rows inside the parquet file. so for that I found the path of all parquet with this script
l=[]
m=[]
l_parquet=[]
m_parquet=[]
for dirpath, dirnames, filenames in os.walk("/dbfs/mnt/sgi/report/"):
for filename in [f for f in filenames if f.endswith(".parquet")]:
path = os.path.join(dirpath, filename)
df_parquet=pd.read_parquet(path)
l_parquet.append(df.shape[0])
m_parquet.append(df.shape[1])
weatther={'path':path, 'count_of_rows':l_parquet, 'count_of_col':m_parquet}
df=pd.DataFrame(weatther)
df['path'].unique()
but the problem is df_csv is only able to store last folder file.as you can see
array(['/dbfs/mnt/sgi/report/trigger_demo/SAMPLE_DATA_2.parquet'],
dtype=object)
can you pls tell me how can i take count of all paths in a dataframe