open json files in a loop - formatting problem

Question

I need to open files in my s3 bucket and those are the files:

I want to apply some piece of code on each of them, hence I want to open them in a loop.

But I have a problem with formatting. The files are between 1 and 999, I cannot loop though range 1, 999 :

for i in range(1,1000):
    file_to_predict = spark.read.json(f"s3a://mu_bucket/company_v20_dl/part-00{i}.gz")

i will be replaced with 1, 2 etc, I would like it to be replaced with 001, 002 etc <- taking three spaces (as the highest is 999 - taking 3 spaces). Do you perhaps know how to deal with such case?

[EDIT] I am able to open single file without unzipping it:

I checked the data and I run already single file - I am able to open it. — Kas, Dec 19 '22 at 14:10

score 1 · Accepted Answer · answered Dec 19 '22 at 14:11

1

The files have GZ extension. That's a common extension for GZip. Whatever is in those zipped files, you need to unzip them first.

Other than that, use {i:03d} for a 3 digit number with leading zeros.

answered Dec 19 '22 at 14:11

Thomas Weller

55,411
20
125
222

open json files in a loop - formatting problem

1 Answers1