0

I have a collection of binary files which have names as so:

d010-recomb.bin
d011-recomb.bin
.............
.............
.............
d100-recomb.bin

Using the python glob module, i can access all the files in a folder and can do further processing with these files:

import glob
binary = sorted(glob.glob('C:/Users/Desktop/bin/*.bin')) 

I can also use some criteria for the files that I want to access:

FOr example if I use the following code then I will gain access to all the files from d010-recomb.bin to d019-recomb.bin

binary = sorted(glob.glob('C:/Users/Desktop/bin/d01*.bin'))

But using this criteria I can't get access to files such as d015 to d025.

Please tell me what I can do to gain access to these files.

glglgl
  • 89,107
  • 13
  • 149
  • 217
user2095624
  • 363
  • 6
  • 8
  • 17

3 Answers3

1

You can either filter list, using:

def filter_path(path,l,r):
    i = int(os.path.basename(path)[1:4])
    if (i >= l) and (i <= r):
        return True
    return False

result = [i for i in binary if filter_path(i,19,31)]

If you are 100% confident about number of elements in directory, you can:

result = binary[19:30]

Or once you have data sorted, you may find the first index and the last index and [1][2]:

l = binary.find('C:/Users/Desktop/bin/d015.bin')
r = binary.find('C:/Users/Desktop/bin/d023.bin')
result = binary[l:r+1]
Community
  • 1
  • 1
Vyktor
  • 20,559
  • 6
  • 64
  • 96
0

Filter the list afterwards; either turn the filename portion to an int or create a range of strings that are to be included:

included = {'d{:03d}'.format(i) for i in range(15, 26)}  # a set

binary = sorted(f for f in glob.glob('C:/Users/Desktop/bin/*.bin') if f[21:25] in included) 

The above code generates the strings 'd015' through to 'd025' as a set of strings for fast membership testing, then tests the first 4 characters of each file against that set; because glob() returns whole filenames we slice off the path for that to work.

For variable paths, I'd store the slice offset, for speed, based on the path:

pattern = 'C:/Users/Desktop/bin/*.bin'
included = {'d{:03d}'.format(i) for i in range(15, 26)}  # a set
offset = len(os.path.dirname(pattern)) + 1

binary = sorted(f for f in glob.glob(pattern) if f[offset:offset + 4] in included) 

Demo of the latter:

$ mkdir test
$ touch test/d014-recomb.bin
$ touch test/d015-recomb.bin
$ touch test/d017-recomb.bin
$ touch test/d018-recomb.bin
$ fg
bin/python2.7
>>> import os, glob
>>> pattern = '/tmp/stackoverflow/test/*.bin'
>>> included = {'d{:03d}'.format(i) for i in range(15, 26)}  # a set
>>> offset = len(os.path.dirname(pattern)) + 1
>>> sorted(f for f in glob.glob(pattern) if f[offset:offset + 4] in included)
['/tmp/stackoverflow/test/d015-recomb.bin', '/tmp/stackoverflow/test/d017-recomb.bin', '/tmp/stackoverflow/test/d018-recomb.bin']
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • The `f[21:25]` is pretty case-specific and won't work for different paths. How about `if os.path.basename(f).split('.')[0] in included`? This should be independent from the path before the filename. – Mike Müller May 27 '13 at 09:23
  • @MikeMüller: it would be; but the slice is faster. You could store the offset in a variable. – Martijn Pieters May 27 '13 at 09:27
  • Hi Martijn, thank you for your answer. I am trying to understand your code. specially the second and last line. when I run the code, it says KeyError: '03d'. Could you please explain a bit as i am not very good at it. The answer from glglgl is working though. But I would like to understand your coding. Thanks again. – user2095624 May 27 '13 at 09:37
  • @user2095624: That was my fault, actually; updated the `.format()` template to work properly. Mea Culpa! – Martijn Pieters May 27 '13 at 09:38
0

You'll probably have to add this restriction manually, as it cannot be accomplished by a glob pattern.

If you exactly know how the file names are built, you could do

import os
for i in range(19, 34): # 19 to 33
    filename = "d%03d-recomb.bin" % i
    if os.path.exists(os.path.join('C:/Users/Desktop/bin', filename)):
        print filename
glglgl
  • 89,107
  • 13
  • 149
  • 217