Python loop over files - best practice

Question

I would like to know what would be better if I loop over a bunch of files to manipulating them.

1) Sent a path and loop inside the function:

def convert_png_to_jpg(path_to_images):
    all_images = os.listdir(path_to_images)
    for image in all_images:
        # open file
        # do something with my file

2) Or do the loop outside the function and call the function every time:

    def convert_png_to_jpg(image):
        # do something with my image

    all_images = os.listdir(path_to_images)
    for image in all_images:
        # open file
        convert_png_to_jpg(image)

For the second case, is it better to open the image inside the function or outside, as in the example?

What is better in terms of clean code?

As you call the function `convert_png_to_jpg`, it seems like this function intends to work on a single file. I would suggest to go with the second approach. It would result in a clean and readable code. — Ahmed Dhanani, Oct 23 '17 at 09:12
the job of the function is to `convert_png_to_jpg` and hence all it has to do is take an image and perform the action of converting the png to jpg, even raise error if the file provided to the function is not png. I think it — Arpit Goyal, Oct 23 '17 at 09:33
@Martijn Pieters Agree with you, may I ask where is the right place to ask such questions? — honeymoon, Oct 23 '17 at 10:04
@snowflake: a discussion forum or chatroom perhaps? Or Quora, they accept opinion-based questions. — Martijn Pieters, Oct 23 '17 at 10:08
@Martijn Pieters In my opinion you can not compare the quality and experience compared to users of stackoverflow. People here have far more and longer experience and you get much more sophisticated answers. Unfortunately stackoverflow doesn't allow or add a function for opinion-based questions (as a side site) but tries to move the questioner to other platforms which does not work because the responders stay here. But that's another discussion...Sorry for asking here... — honeymoon, Oct 23 '17 at 10:19
@snowflake: they are here *because* we filter out off-topic posts, though. We attract higher-quality contributors because of the focus on quality, long-term value and answerability. — Martijn Pieters, Oct 23 '17 at 10:22

score 3 · Accepted Answer · answered Oct 23 '17 at 09:18

My opinion is that it depends on level of encapsulation you want to achieve.

First case

Here you encapsulate work with files and an image conversion in a single function which is good. Now, think about SOLID principle, where is single responsibility here? It would be better to split this function in 2:

get a stream
convert an image stream to a desired format

Which is why I like option 2d better, however, let's go further.

Second case

In the second case you extract a logic of converting an image from a stream (No matter what is the source) which is good. Keep it that way.

Now, you list every file, open in and pass to the conversion function. Does it sound like a 3 separate actions? You can stop here if this code is not going to be reused.

If at any moment you want to reuse image conversion logic you can move it to a separate class or helper for instance.

I see two better/other options here:

Option #1

def convert_png_to_jpg(image):
       # do something with my image

def convert_to_jpg(filepath):
    # open file
    convert_png_to_jpg(image)

all_images = os.listdir(path_to_images)    for image in all_images:
convert_to_jpg(image) for image in all_images

Option #2

Extract converter and use it where it is needed.

class PngToJpgConverter(object):
     def convert(image):
         pass
     def convert_from_file(filepath):
         # open file
         return self.convert(image)

converter = PngToJpgConverter()
all_images = os.listdir(path_to_images)
    for image in all_images:
        converter.convert_from_file(image)

Very interesting, thanks for your help and time. I first will go for my option 2 but in case I have to reuse, I will choose one of your options and refactor again. — honeymoon, Oct 23 '17 at 09:52
Yep, that's exactly what I recommend to do. Start simple and move fast, as soon as you need reusability - refactor and move on. — taras, Oct 23 '17 at 09:57

score 1 · Answer 2 · answered Oct 23 '17 at 09:17

1

I would say concentrate on readability. If your function is simple and only a few lines, put it in the loop. Otherwise a separate function makes the code easier to read.

Others have answered similar questions here.

And here is an explanation of why the loop might be faster in a loop if you are interested in performance at a low level.

answered Oct 23 '17 at 09:17

Arryn

114
8

Thanks for the links, performance is an important point for me! – honeymoon Oct 23 '17 at 09:49

score 0 · Answer 3 · answered Oct 23 '17 at 09:19

The second option seems "better" as observed by MrPyCharm comment.

In any case, you can eventually create a wrapper function that accepts a list and feeds each element to the function to do some batch processing

def batch_convert_png_to_jpg(path_to_images):
    files = os.listdir(path_to_images)
        for image in path_to_images:
            # do the work

Python loop over files - best practice

3 Answers3

First case

Second case

Option #1

Option #2