0

I am using a script from https://towardsdatascience.com/a-keras-pipeline-for-image-segmentation-part-1-6515a421157d to split a data set. I don't understand what this part is doing

all_frames = os.listdir(FRAME_PATH)
all_masks = os.listdir(MASK_PATH)


all_frames.sort(key=lambda var:[int(x) if x.isdigit() else x 
                                for x in re.findall(r'[^0-9]|[0-9]+', var)])
all_masks.sort(key=lambda var:[int(x) if x.isdigit() else x 
                               for x in re.findall(r'[^0-9]|[0-9]+', var)])

More specifically I do not understand what the everything the var: is doing. My first guess would be a list comprehension, but it does not follow the structure.

[ expression for item in list if conditional ] 

Also what is the purpose of this part re.findall(r'[^0-9]|[0-9]+', var) ?

thank you

Lis Lou
  • 41
  • 5
  • You can learn more about list comprehension syntax in this post, namely about if/else: https://stackoverflow.com/questions/4260280/if-else-in-a-list-comprehension – Luka Mesaric May 25 '20 at 16:22

1 Answers1

2

The int(x) if x.isdigit() else x is a ternary operator ("if condition then this else that"), which you're right isn't part of the list comprehension. This is saying "turn x (from within the list comprehension) into an integer if it contains only digits".

So we could write this all out like:

def convert_integer(x):
    if x.isdigit():
        return int(x)
    else:
        return x

def key_function(var):
    return [convert_integer(x) 
               for x in re.findall(r'[^0-9]|[0-9]+', var)]

all_frames.sort(key = key_function)
David Robinson
  • 77,383
  • 16
  • 167
  • 187
  • yes, as I mentioned, what I dont understand is what the return part is doing – Lis Lou May 25 '20 at 16:23
  • @LisLou Do you mean the ternary operator `int(x) if x.isdigit() else x`? I've added a section to my answer describing that as well – David Robinson May 25 '20 at 16:26
  • @DavidRobinson Regarding `x.isdigit()`, given regex will match `'a'`, `'b'` and `'c'` in `"abc"` because of the first part `[^0-9]`, so not everything has to be made up of digits. – Luka Mesaric May 25 '20 at 16:32
  • So, `[int(x) if x.isdigit() else x for x in re.findall(r'[^0-9]|[0-9]+', var)]` is a list comprehension statement where ` [ expression for item in list if conditional ]` expression is `int(x) if x.isdigit() else x` and there is no conditional at the end? only a for iteration over the elements of the list? – Lis Lou May 25 '20 at 16:43
  • I am not sure if I understand `re.findall(r'[^0-9]|[0-9]+', var)]`, I tried this `var = "blabla0236abs7b8b9b0102b1" print(re.findall(r'[^0-9]|[0-9]+', var))` and the output was `['b', 'l', 'a', 'b', 'l', 'a', '0236', 'a', 'b', 's', '7', 'b', '8', 'b', '9', 'b', '0102', 'b', '1']` From what I understand it will return a list of strings with all the matches of sequences of numbers`[0-9]`, or this pattern `[^0-9]`, according to the documentation `^` matches the start of a string, so `[^0-9]` means a sequence like (any character)+a number from 0 to 9? – Lis Lou May 25 '20 at 17:16
  • `^` has two different meaning in regular expressions (which I agree is confusing)- when inside a pair of square brackets it means "not", so `[^0-9]` means "any one character that is not a digit". – David Robinson May 25 '20 at 18:28