I am working with a list of filenames. There are no duplicates and the list is sorted.
The list can be grouped into subsets. Files with a _0001
in the name indicate the start of a new subset. Then _0002
is the 2nd item in the subset, and so on. I would like to transform this flat list into a hierarchical list of lists.
Here is an example of the original, flat list:
['Log_03-22-2016_12-06-18_GMT_0001.log',
'Log_03-22-2016_12-10-41_GMT_0002.log',
'Log_03-22-2016_12-11-56_GMT_0003.log',
'Log_03-22-2016_12-13-12_GMT_0004.log',
'Log_03-22-2016_12-14-27_GMT_0005.log',
'Log_03-22-2016_12-15-43_GMT_0006.log',
'Log_03-22-2016_12-16-58_GMT_0007.log',
'Log_03-23-2016_09-08-57_GMT_0001.log',
'Log_03-23-2016_09-13-24_GMT_0002.log',
'Log_03-23-2016_09-14-26_GMT_0003.log',
'Log_03-23-2016_09-15-27_GMT_0004.log',
'Log_03-23-2016_11-17-57_GMT_0001.log',
'Log_03-23-2016_11-19-21_GMT_0002.log']
I would like to slice this into lists of subsets, using the presence of the _0001
to detect the beginning of a new subset. Then return a list of all the lists of subsets. Here is an example output, using the above input:
[['Log_03-22-2016_12-06-18_GMT_0001.log',
'Log_03-22-2016_12-10-41_GMT_0002.log',
'Log_03-22-2016_12-11-56_GMT_0003.log',
'Log_03-22-2016_12-13-12_GMT_0004.log',
'Log_03-22-2016_12-14-27_GMT_0005.log',
'Log_03-22-2016_12-15-43_GMT_0006.log',
'Log_03-22-2016_12-16-58_GMT_0007.log'],
['Log_03-23-2016_09-08-57_GMT_0001.log',
'Log_03-23-2016_09-13-24_GMT_0002.log',
'Log_03-23-2016_09-14-26_GMT_0003.log',
'Log_03-23-2016_09-15-27_GMT_0004.log'],
['Log_03-23-2016_11-17-57_GMT_0001.log',
'Log_03-23-2016_11-19-21_GMT_0002.log']]
Here is the current solution I have. It seems like there ought to be a more elegant and Pythonic way of doing this:
import glob
first_log_indicator = '_0001'
log_files = sorted(glob.glob('Log_*_GMT_*.log'))
first_logs = [s for s in log_files if first_log_indicator in s]
LofL = []
if len(first_logs) > 1:
for fl_idx, fl_name in enumerate(first_logs):
start_slice = log_files.index(fl_name)
if fl_idx + 1 < len(first_logs):
stop_slice = log_files.index(first_logs[fl_idx+1])
LofL.append(log_files[start_slice:stop_slice])
else:
LofL.append(log_files[start_slice:])
else:
LofL.append(log_files)
I looked into itertools
, and while I am admittedly unfamiliar with that module, I didn't see anything that quite did this.
The closest questions I could find on SO all had the sublists of fixed length. Here, the sublists are of arbitrary length. Others used the presence of a "separator" to delimit the sublists in the original (flat) list, and which ultimately get thrown out when making the list of lists. I do not have a separator in that sense, since I do not want to throw away any items in the original list.
Can anyone please suggest a better approach than what I have above?