2

I have found myself writing code like this fairly often:

nums = {}
allpngs = [ii for ii in os.listdir() if '.png' in ii]
for png in allpngs:
  regex = r'(.*)(\d{4,})\.png'
  prefix = re.search(regex, png).group(1)
  mynum = int(re.search(regex, png).group(2))
  if prefix in nums.keys():
    nums[prefix].append(mynum)
  else:
    nums[prefix] = [mynum]

Essentially, I'm wanting to create a dictionary of lists where the keys are not known ahead of time. This requires the if statement you see at the bottom of the for loop. I seem to write this pretty frequently, so I'm wondering if there's a shorter / more pythonic way of doing this.

To be clear, what I have works fine, but it would be nice if I could do it in fewer lines and still have it be readable and apparent what's happening.

Thanks.

joejoejoejoe4
  • 1,206
  • 1
  • 18
  • 38
  • you should not do the regex 2x ... `match = re.search(...);prefix = match.group(0);num = match.group(1)` – Joran Beasley Oct 08 '22 at 18:40
  • And you can use `glob` or `pathlib` to avoid having to do a list comprehension against `os.listdir`. I also recommend `defaultdict` from `collections` if you need more complex default rules (or just `list` as would work fine in this case). Additional side note: `if prefix in nums.keys()` is a convoluted way of typing `if prefix in nums` - dictionary keys can be looked up directly against the dict, no need to use `keys` to create an iterator and then have to go through that. – MatsLindh Oct 08 '22 at 18:43
  • @MatsLindh, for some reason I have found that `glob.glob()` is extremely slow when iterating over thousands of png files located on a network drive, where `os.listdir()` is significantly faster. – joejoejoejoe4 Oct 08 '22 at 18:44

1 Answers1

2

You can use dict.setdefault:


...

nums.setdefault(prefix, []).append(mynum)

...
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91