1

I have a list contains dates and id, for example:

olist = ['20191101_01.csv','20191101_02.csv','20191101_03.csv','20191101_04.csv','20191102_01.csv','20191102_02.csv','20191102_03.csv','20191102_04.csv','20191103_01.csv','20191103_02.csv','20191103_03.csv','20191103_04.csv']

and I want to cut them by ids, for example:

nlist = [['20191101_01.csv','20191102_01.csv','20191103_01.csv','20191104_01.csv'],['20191101_02.csv','20191102_02.csv','20191103_02.csv','20191104_02.csv']......]

is there a simple and clean way to do it?

qwer1234
  • 257
  • 4
  • 9
  • This should probably answer your question https://stackoverflow.com/questions/949098/python-split-a-list-based-on-a-condition – Abhyudai Nov 18 '19 at 03:14
  • Welcome to SO. Please take the [tour] and take the time to read [ask] and the other links found on that page. This isn't a discussion forum or tutorial service. You should invest some time working your way through [the Tutorial](https://docs.python.org/3/tutorial/index.html), practicing the examples. It will give you an introduction to the tools Python has to offer for solving your problem. – wwii Nov 18 '19 at 03:24

4 Answers4

1

I would suggest using a dict. You can then achieve it o(n) time

olist = ['20191101_01.csv','20191101_02.csv','20191101_03.csv','20191101_04.csv','20191102_01.csv','20191102_02.csv','20191102_03.csv','20191102_04.csv','20191103_01.csv','20191103_02.csv','20191103_03.csv','20191103_04.csv']
parsed_dict = {}
for el in olist:
  key = el.split('_')[1]
  if parsed_dict.get(key) is None:
    parsed_dict[key] = [el]
  else:
    parsed_dict[key].append(el)

print(parsed_dict)

edit, updated according to wwii's comment:

from collections import defaultdict

olist = ['20191101_01.csv','20191101_02.csv','20191101_03.csv','20191101_04.csv','20191102_01.csv','20191102_02.csv','20191102_03.csv','20191102_04.csv','20191103_01.csv','20191103_02.csv','20191103_03.csv','20191103_04.csv']
parsed_dict = defaultdict(list)
for el in olist:
  key = el.split('_')[1]
  parsed_dict[key].append(el)

print(parsed_dict)
Simas Joneliunas
  • 2,890
  • 20
  • 28
  • 35
  • You could use a `collections.defaultdict(list)` to avoid the `if/else`. – wwii Nov 18 '19 at 03:27
  • 1
    @wwii or [**`dict.setdefault`**](https://docs.python.org/3/library/stdtypes.html#dict.setdefault), e.g. `parsed_dict.setdefault(key, []).append(el)` – Peter Wood Nov 20 '19 at 00:38
0

I'd use collections.defaultdict and a list compreension, i.e.:

from collections import defaultdict
olist = ['20191101_01.csv','20191101_02.csv','20191101_03.csv','20191101_04.csv','20191102_01.csv','20191102_02.csv','20191102_03.csv','20191102_04.csv','20191103_01.csv','20191103_02.csv','20191103_03.csv','20191103_04.csv']
d = defaultdict(list)
[d[x.split("_")[1].split(".")[0]].append(x) for x in olist]
print(dict(d))

{'01': ['20191101_01.csv', '20191102_01.csv', '20191103_01.csv'], '02': ['20191101_02.csv', '20191102_02.csv', '20191103_02.csv'], '03': ['20191101_03.csv', '20191102_03.csv', '20191103_03.csv'], '04': ['20191101_04.csv', '20191102_04.csv', '20191103_04.csv']}

Demo

Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268
0

could also use pandas for this:

import pandas as pd
df = pd.DataFrame({'files':olist})

df['grouper'] = df['files'].str.split('_',expand=True)[1]
nlist = df.groupby('grouper')['files'].agg(list).tolist()

output:

[['20191101_01.csv', '20191102_01.csv', '20191103_01.csv'], ['20191101_02.csv', '20191102_02.csv', '20191103_02.csv'], ['20191101_03.csv', '20191102_03.csv', '20191103_03.csv'], ['20191101_04.csv', '20191102_04.csv', '20191103_04.csv']]
Derek Eden
  • 4,403
  • 3
  • 18
  • 31
0

You could sort the list using the two character id and then group it using itertools.groupby.

from itertools import groupby

olist = ['20191101_01.csv','20191101_02.csv','20191101_03.csv','20191101_04.csv',
         '20191102_01.csv','20191102_02.csv','20191102_03.csv','20191102_04.csv',
         '20191103_01.csv','20191103_02.csv','20191103_03.csv','20191103_04.csv']

file_id = lambda filename: filename[-6:-4]

slist = sorted(olist, key=file_id)

result = [list(value) for key, value in groupby(slist, key=file_id)]

print(result)

The output:

[['20191101_01.csv', '20191102_01.csv', '20191103_01.csv'],
 ['20191101_02.csv', '20191102_02.csv', '20191103_02.csv'],
 ['20191101_03.csv', '20191102_03.csv', '20191103_03.csv'],
 ['20191101_04.csv', '20191102_04.csv', '20191103_04.csv']]
Peter Wood
  • 23,859
  • 5
  • 60
  • 99