Sort List into different lists

Question

I have list with file_names in it. (About 800 file_names)

[Example] file_name = 23475048_43241u_43x_pos11_7.npz

I need to sort the file_names and add it to lists. The file_names get sorted with the "pos". In my example is that pos11. (there are different pos -> pos0, pos12...)

I tried firstly to get all different pos_numbers in a Dict:

path =[filename for filename in glob.glob(os.path.join(my_dir, '*.npz'))] 

posList = []

for file in path:
  file_name = Path(file).parts[-1][:-4].split("_")
  posList.append(file_name[3])

mylist =  list(dict.fromkeys(posList))
files_dict = {}
for pos in mylist:files_dict[pos] = []

Output:

{'pos0': [], 'pos10': [], 'pos11': [], 'pos12': [], 'pos1': [], 'pos2': [], 'pos3': [], 'pos4': [], 'pos5': [], 'pos6': [], 'pos7': [], 'pos8': [], 'pos9': []}

And now I want to fill the different lists. But now I'm stuck. I want to to iter again over the list with file_names and add them to right list.

You have a dict, so this should solve it for you: https://stackoverflow.com/questions/9001509/how-can-i-sort-a-dictionary-by-key — Sefan, May 04 '22 at 08:50
How do you want them sorted? by size, alphabetical, ASCII code, etc.? — 2br-2b, May 04 '22 at 08:56
@2br-2b by the pos-numbers. Each file_name has a pos-number part in it. See Example. In my example it is _pos11_ its at the end of the file name. — wimsenOG, May 04 '22 at 08:58
An example with expected input and outputs would go a long way to make this question easier to understand — ARandomDeveloper, May 04 '22 at 09:03

ARandomDeveloper · Accepted Answer · 2022-05-04T10:13:37.190

2

Not sure what your code is doing but you can use the below program which takes in list of file names and outputs a dictionary of sorted lists indexed by the pos which is what I think you are trying to do. (If not maybe edit your question to elaborate some more)

files = ['1_2_3_pos1_2.np', '2_3_1_pos2_2.npz']
files_dict = {}
for file in files:
    pos = file.split('_')[3]
    files_dict[pos] = files_dict.get(pos, []) + [file]

for k in files_dict.keys():
    files_dict[k].sort()

print(files_dict)

Edit: As @Stef suggested you can make it more effecient by using setdefault

files = ['1_2_3_pos1_2.np', '2_3_1_pos2_2.npz']
files_dict = {}
for file in files:
    pos = file.split('_')[3]
    files_dict.setdefault(pos, []).append(file)

for k in files_dict.keys():
    files_dict[k].sort()

print(files_dict)

edited May 04 '22 at 10:13

answered May 04 '22 at 09:01

ARandomDeveloper

88
6

1

I upvoted, but note that the complexity of `+` is linear in the length of the lists, so using `+` repeatedly is inefficient. Here you want to mutate the list in place to add one element; `.append` is the best choice. You can replace the one line `files_dict[pos] = files_dict.get(pos, []) + [file]` with two lines `files_dict[pos] = files_dict.get(pos, []); files_dict[pos].append(file)` or in just one line, you can do: `files_dict.setdefault(pos, []).append(file)` – Stef May 04 '22 at 09:17
Also consider adding links to the documentation: https://docs.python.org/3/library/stdtypes.html#dict.setdefault and https://docs.python.org/3/library/stdtypes.html#dict.get – Stef May 04 '22 at 09:25

score 2 · Answer 2 · answered May 04 '22 at 09:56

@ARandomDeveloper's answer clearly explains how to populate the dict by iterating through the list only once. I recommend to study their answer until you've understood it well.

This is a very common way to populate a dict. You will probably encounter this pattern again.

Because this operation of grouping into a dict is so common, module more_itertools offers a function map_reduce for exactly this purpose.

from more_itertools import map_reduce

posList = '''23475048_43241u_43x_pos11_7.npz
23475048_43241u_43x_pos1_7.npz
23475048_43241u_43x_pos10_7.npz
23475048_43241u_43x_pos8_7.npz
23475048_43241u_43x_pos22_7.npz
23475048_43241u_43x_pos2_7.npz'''.split("\n") # example list from uingtea's answer

d = map_reduce(posList, keyfunc=lambda f: f.split('_')[3])

print(d)
# defaultdict(None, {
#   'pos11': ['23475048_43241u_43x_pos11_7.npz'],
#   'pos1': ['23475048_43241u_43x_pos1_7.npz'],
#   'pos10': ['23475048_43241u_43x_pos10_7.npz'],
#   'pos8': ['23475048_43241u_43x_pos8_7.npz'],
#   'pos22': ['23475048_43241u_43x_pos22_7.npz'],
#   'pos2': ['23475048_43241u_43x_pos2_7.npz']
# })

Internally, map_reduce uses almost-exactly the same code as suggested in @ARandomDeveloper's answer, except with a defaultdict.

score 0 · Answer 3 · answered May 04 '22 at 09:05

you need to extract the digits after pos use regex (\d+)_\d\.npz then use .sort() function

import re

posList = '''23475048_43241u_43x_pos11_7.npz
23475048_43241u_43x_pos1_7.npz
23475048_43241u_43x_pos10_7.npz
23475048_43241u_43x_pos8_7.npz
23475048_43241u_43x_pos22_7.npz
23475048_43241u_43x_pos2_7.npz'''.split("\n")


posList = sorted(posList, key=lambda x: int(re.search(r"(\d+)_\d\.npz", x)[1]))
print(posList)

results

['23475048_43241u_43x_pos1_7.npz',
  '23475048_43241u_43x_pos2_7.npz',
  '23475048_43241u_43x_pos8_7.npz',
  '23475048_43241u_43x_pos10_7.npz',
  '23475048_43241u_43x_pos11_7.npz',
  '23475048_43241u_43x_pos22_7.npz'
]

Sort List into different lists

3 Answers3