I am reading 3 blobs from Azure storage , loading them into a dataframe and later filtering the dataframe.
Below is the code.
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
container_name = ""
path = "/"
dt = ''
pth = os.path.join(path, dt)
container_client = blob_service_client.get_container_client(container_name)
blob_list = container_client.list_blobs(name_starts_with=pth)
for blob in blob_list:
blob_client = container_client.get_blob_client(blob)
stream = blob_client.download_blob()
fileReader = json.loads(stream.readall())
df= pd.DataFrame.from_records(fileReader)
id ='2fr5'
df2 = df[dfItem['ID'] == id]
if len(df2.index) == 0:
print("0")
else:
print("l")
After filtering, if the dataframe is empty I should get O,else L. But I am getting the below output if the ID is not present in the dataframe.
O
O
O
When the ID is present in the dataframe, I am getting the below output.
O
l
O
Its giving me output on 3 blobs separately instead reading all the 3 blobs into a single dataframe. Could someone assist.
Thank you.
Below is the dataframe after reading the file from the storage.
df= pd.DataFrame.from_records(fileReader)
Date salary tax ID
0 2022-09-16 5064.000000 504.000000 6fr5
1 2022-09-16 33.157895 3.157895 7fr5
Date salary tax id
0 2022-09-16 5046.000000 504.000000 2fr5
1 2022-09-16 36.157895 3.157895 3fr5
Date salary tax id
0 2022-09-16 5064.000000 504.000000 1fr5
1 2022-09-16 367.157895 3.157895 5fr5