1

I'm trying to use this code in Python using regular expression to get all the image files (of types jpg, png and bmp) in my current folder and add a word "resized" inbetween the filename and the extension

Input

  • Batman - The Grey Ghost.png
  • Mom and Dad - Young.jpg

Expected Output

  • Batman - The Grey Ghost_resized.png
  • Mom and Dad - Young_resized.jpg

Query

But my output is not as expected. Somehow the 2nd letter of the extension is getting replaced. I have tried tutorials online, but didn't see one which answers my query. Any help would be appreciated.

Code:

import glob
import re
files=glob.glob('*.[jp][pn]g')+glob.glob('*.bmp')

for x in files:
    new_file = re.sub(r'([a-z|0-9]).([jpb|pnm|ggp])$',r'\1_resized.\2',x)
    print(new_file,' : ',x)

Code Output

Ma image scan - Copy.j\_resized.g  :  Ma image scan - Copy.jpg
Ma image scan.j\_resized.g  :  Ma image scan.jpg
Mom and Dad - Young.j\_resized.g  :  Mom and Dad - Young.jpg
PPF - SBI - 4.j\_resized.g  :  PPF - SBI - 4.jpg
when-youre-a-noob-programmer-and-you-think-your-loop-64102565.p\_resized.g  :  when-youre-a-noob-programmer-and-you-think-your-loop-64102565.png
Sample.b\_resized.p  :  Sample.bmp


1 Answers1

0

Try this:

r'([a-zA-Z0-9_ -]+)\.(bmp|jpg|png)$'

Input:

  • Batman - The Grey Ghost.png

Output:

  • Batman - The Grey Ghost_resized.png

See live demo.

SaSkY
  • 1,086
  • 1
  • 4
  • 14
  • Thanks got it. I also checked the live demo, that clarified all my questions about how it is working. – Pratyush Biswas Nov 05 '22 at 21:48
  • Sure First let's explain your regex pattern then explain mine. let's take this "Batman - The Grey Ghost.png" as an example. In your previous regex you used "([a-z|0-9]).([jpb|pnm|ggp])$", let's explain this first: The first part of your regex pattern: "([a-z|0-9])" 1- you used () which is the first capturing group. – SaSkY Nov 05 '22 at 22:04
  • 2- [a-z|0-9] literally means that capture only one character, and that character have to be a small letter from a to z or a literal vertical bar(|) or a number from 0 to 9. at this point the regex engine will look at the first character of our string "Batman - The Grey Ghost.png", the first character here is a capital letter "B". because of that the engine will decide to ignore it an continue. The second character is the small letter "a" and this letter will be matched, specifically this regex pattern "([a-z|0-9])" will match it. And the letter "a" will be saved in the first capturing group – SaSkY Nov 05 '22 at 22:05
  • The second part of your regex pattern: ".([jpb|pnm|ggp])$" 1- the dot(.) literaly means that match a single character except for line terminators, because of that the dot will match the third letter "t" of the string. NOTE: If you want to match only a literal dot you have to escape it with a backslash(\). so you have to write "\." instead of ".". 2- you used () which is the second capturing group. – SaSkY Nov 05 '22 at 22:06
  • 3- [jpb|pnm|ggp] this literally means that match only on character and that character has to be one of these characters: "j", "p", "b", "|", "p", "n", "m", "g", "g", "p". so the regex engine will capture the fourth character which is "m" because this letter has been found in our list. then this letter will be saved at the second capturing group. – SaSkY Nov 05 '22 at 22:07
  • 4- "$" this means the end of the line, but here we didn't reached the end of the line because of that this match will fail and the regex engine will ignore what has been matched before(i.e., atm) of the string "Batman - The Grey Ghost.png". and it will continue from the beginning ignoring the first letter "B" because this is a captial letter and ignoring the second letter which is "a" because our previous match failed at the last token of the regex which is "$" failed to match the fifth letter of the string "Batman - The Grey Ghost.png" which is the letter "a" after the letter "m". – SaSkY Nov 05 '22 at 22:07
  • Wow, that is much more than I expected. Many thanks for such clear instructions. – Pratyush Biswas Nov 05 '22 at 22:15
  • The regex engine will continue trying until it reaches to the last word of the "Batman - The Grey Ghost.png" which is "png" 1- This part "([a-z|0-9])" of the regex will match the letter "p" of the word "png". and the letter "p" will be saved inside the first capturing group. – SaSkY Nov 05 '22 at 22:21
  • 2- This part ".([jpb|pnm|ggp])$" of the regex will match "n", "g" and end of the line respectively, In more details the dot(.) will match the letter "n", and this part "([jpb|pnm|ggp])" will match the letter "g", then the letter "g" will be saved inside the second capturing group, and the dollar sign($) will match the end of the line. – SaSkY Nov 05 '22 at 22:21
  • After that at the substitution part. 1- From your code: r'\1_resized.\2' 2- The matched part is "png" 3- The png will be deleted, and the string will be "Batman - The Grey Ghost." 4- \1 the value inside the first capturing group which is the letter "p", so we will add it at the end of the string, the string will be "Batman - The Grey Ghost.p" 5- "_resized." this will be added too, and the string will be "Batman - The Grey Ghost.p_resized." – SaSkY Nov 05 '22 at 22:43
  • 6- \2 the value inside the second capturing group which is the letter "g" will be added too. the string will be "Batman - The Grey Ghost.p_resized.g" – SaSkY Nov 05 '22 at 22:43
  • @PratyushBiswas My pleasure :) This is your regex pattern, I will explain my regex pattern too. – SaSkY Nov 05 '22 at 22:50
  • I got your regex patter, via the live demo. So dont worry and please enjoy your Sunday :-) – Pratyush Biswas Nov 06 '22 at 10:29