-4
listA = ['Leonardo_da_Vinci', 'Napoleon', 'Cao_Cao', 'Elton_John']
listB = ['123_Leonardo_da_Vinci_abc.csv', '456_Cao_Cao_def.csv']

listC = ['Napoleon', 'Elton_John']

I would like to check if the items in listB contain the values in listA, and returns listC (i.e. the list that is missing in listB). For this purpose I would like to use regex (and no other heuristics) like (example of checking Leonardo_da_Vinci): .*Leonardo_da_Vinci.*

The reason for regex is that the example above is just the simplest mockup and the data I use is much bigger. It is also good to have a generalised code which works on another data in future.

user7665853
  • 195
  • 1
  • 15
  • Is there a specific reason why you want to use regex? It's overkill for your use case and will make the code more complicated than simple `in` checks with python – Thomas Mar 06 '23 at 14:00
  • The reason for regex is that the example above is just the simplest mockup and the data I use is much bigger. It is also good to have a generalised code which works on another data in future. – user7665853 Mar 06 '23 at 15:22
  • Alright, that makes more sense. I'd still be careful about using a dynamically generated regex like this. It's just incredible slow compared to alternative solutions. At least if you do have a big amount of data. Otherwise you'll be fine. – Thomas Mar 07 '23 at 14:21

1 Answers1

2

Something like this:

import re

def exists_csv_with_name(name:str, source_list: list) -> bool:
    regex = re.compile(fr'.*{name}.*')
    return any(regex.match(source_str) for source_str in source_list)
    
listC = [name for name in listA if not exists_csv_with_name(name, listB)]
Jorge Luis
  • 813
  • 6
  • 21
  • If there are alternative ways, you are more than welcome! Thanks in advance – user7665853 Mar 06 '23 at 15:32
  • 1
    @user7665853 In what sense alternative ways? What in this solution does not fully satisfy you? – Jorge Luis Mar 06 '23 at 15:45
  • It was supposed to ask more volunteers to get other nice-to-have solutions. Now, I tested a bit further, and it seems I have a problem with values in UTF-8. Somehow values like `Frédéric_Chopin` was excluded from the outcome. Is there a way to indicate it in your code? (although, I am not entirely sure if the encoding is the exact problem) – user7665853 Mar 06 '23 at 17:16
  • 1
    If I understood you right, when you have `listA = ['Frédéric_Chopin']` and `listB = ['110_Frédéric_Chopin.csv']`, then `'Frédéric_Chopin'` is not in `listC`. That is exactly the behavior you asked for! – Jorge Luis Mar 06 '23 at 17:52
  • Sorry, I think this is another problem for UTF-8 encoding in my machine. Something is wrong with matching the filename in UTF-8, and I am just puzzled...I think your code is fine for the original purpose. – user7665853 Mar 07 '23 at 16:28