-2

I need to open files in my s3 bucket and those are the files:

enter image description here

I want to apply some piece of code on each of them, hence I want to open them in a loop.

But I have a problem with formatting. The files are between 1 and 999, I cannot loop though range 1, 999 :

for i in range(1,1000):
    file_to_predict = spark.read.json(f"s3a://mu_bucket/company_v20_dl/part-00{i}.gz")

i will be replaced with 1, 2 etc, I would like it to be replaced with 001, 002 etc <- taking three spaces (as the highest is 999 - taking 3 spaces). Do you perhaps know how to deal with such case?

[EDIT] I am able to open single file without unzipping it:

enter image description here

Kas
  • 313
  • 1
  • 14

1 Answers1

1

The files have GZ extension. That's a common extension for GZip. Whatever is in those zipped files, you need to unzip them first.

Other than that, use {i:03d} for a 3 digit number with leading zeros.

Thomas Weller
  • 55,411
  • 20
  • 125
  • 222