0

The code below only sees photos with jpg extension. How should I change the syntax to include photos in png,jpg,jpeg format

image_df_train_f = pd.DataFrame({'path': list(Path(input_path).glob('**/*.jp*g'))})
ThePyGuy
  • 17,779
  • 5
  • 18
  • 45
zey.s
  • 13
  • 4

2 Answers2

0

you can simply add more addition to the DF

image_df_train_f = pd.DataFrame({'path': list(Path(input_path).glob('**/*.jpg'))+list(Path(input_path).glob('**/*.png'))+list(Path(input_path).glob('**/*.jpeg'))})

This will solve your problem

or ====================

shpfiles = []
for dirpath, subdirs, files in os.walk(path):
  for x in files:
    if x.split('.')[-1] in ["shp","pdf","txt"]:
        shpfiles.append(os.path.join(dirpath, x))
Tamil Selvan
  • 1,600
  • 1
  • 9
  • 25
  • 1
    it doesn't work but I fixed it with this code: image_df_train_f = pd.DataFrame({'path': list(Path(input_path).glob("*.[jp][pn]g"))}) – zey.s Aug 03 '21 at 18:38
  • that's good. or you can use the os.walk method since there might be some time you can have extensions like jpg,pdf,docx..etc. – Tamil Selvan Aug 03 '21 at 18:44
  • @TamilSelvan What are the benefits of `os.walk` over `Path.glob` in this use case? You can list files with multiple extensions with `Path.glob` as OP has already done. – Demian Wolf Aug 03 '21 at 18:47
  • check the latest comment that i put , it can solve many items in a single list, instead of you want to change the RegEX for everytime – Tamil Selvan Aug 03 '21 at 18:49
  • @TamilSelvan your second solution (with `os.walk`) is unreasonably long and cumbersome. Also it's better to use [`os.path.splitext`](https://docs.python.org/3/library/os.path.html#os.path.splitext) for getting an extension of a file. – Demian Wolf Aug 03 '21 at 18:56
  • @DemianWolf the reason to add more extensions. for example, you want to create the same for the list of 100 extensions. the REGEX won't work there. that's why I am going this way. – Tamil Selvan Aug 03 '21 at 19:00
  • @DemianWolf yes you are right. we can do that much more optimistic way. i faced a lot of issues with the glob in mac. but, i started to use os.walk since it will not have any issue with different OS like windows need r"",..etc – Tamil Selvan Aug 03 '21 at 19:11
  • @zeynepsert you shouldn't use glob this way (`image_df_train_f = pd.DataFrame({'path': list(Path(input_path).glob("*.[jp][pn]g"))})`), since it will list not only jpg and png files but also those with extensions like jng. – Demian Wolf Aug 03 '21 at 19:11
  • @TamilSelvan `r""` is not what `glob` needs to work on Windows. `r""` is often used for backslashes (`\`), which are used in Windows paths instead of `/` on *nix. But `glob` (Unix style pathname pattern expansion) does **not** need these. You should just use "/" instead of "\" as usually done for cross-platformity purposes. – Demian Wolf Aug 03 '21 at 19:36
  • @TamilSelvan which issues have you faced with `glob` on Mac? – Demian Wolf Aug 03 '21 at 19:36
0

To list files with different extensions, you can use glob like this:

EXTENSIONS = (".png", ".jpg", ".jpeg")
path = [p.resolve()
        for p in Path(input_path).glob("**/*")
        if p.suffix in EXTENSIONS]
image_df_train_f = pd.DataFrame({'path': path})

Be aware that the solution you provided in the comments, while concise, may list unnecessary files, so you should not use it.

image_df_train_f = pd.DataFrame({'path': list(Path(input_path).glob("*.[jp][pn]g"))})

While it does list files with .jpg and .png (without .jpeg, by the way) extensions, it also includes .jng and .ppg files it locates in the directory.

Demian Wolf
  • 1,698
  • 2
  • 14
  • 34