Python - processing all files in a specific folder

Question

I'm somewhat newish to python (which is the only programming language I know), and I've got a bunch of spectral data saved as .txt files, where each row is a data point, the first number being the wavelength of light used and separated by a tab, the second number is the instrument signal/response to that wavelength of light.

I want to be able to take all the data files I have in a folder, and print a file that's an an average of all the signal/response column entries for each wavelength of light (they all contain data for responses from 350-2500nm light). Is there any way to do this? If it weren't for the fact that I need to average together 103 spectra, I'd just do it by hand, but....

EDIT: I realize I worded this terribly. I now realize I can probably just use os to access all the files in a given folder. The thing is that I want to average the signal values for each wavelength. Ie, I want to read all the data from the folder and get an average value for the signal/response at 350nm, 351nm, etc. I'm thinking this something I could do with a loop once i get all the files read into python, but I'm not 100% sure. I'm also hesitant because I'm worried that will slow down the program a lot.

related: http://stackoverflow.com/questions/10617731/walking-along-and-processing-files-in-directory-in-python — NightShadeQueen, Jul 20 '15 at 03:47
maybe make the question title a bit more specific: **averaging across multiple numerical data files.**. Just a suggestion — dermen, Jul 20 '15 at 04:08

score 1 · Answer 1 · answered Jul 20 '15 at 04:06

1

Something like this (assuming all your txt files are formatted the same, and that all files have the same range of wavelength values )

import os

import numpy as np


dat_dir   = '/my/dat/dir'
fnames    = [ os.path.join(x,dat_dir) for x in os.listdir(dat_dir) if x.endswith('.txt') ]

data      = [ np.loadtxt( f) for f in fnames ]
xvals     = data[0][:,0] #wavelengths, should be the same in each file
yvals     = [ d[:,1] for d in data ] #measurement

y_mean    = np.mean(yvals, axis=0 ) 

np.savetxt( 'spectral_ave.txt', zip(xvals, y_mean) , fmt='%.4f') # something like that

answered Jul 20 '15 at 04:06

dermen

5,252
4
23
34

They're all formatted as the wavelength and signal separated be a tab character, worth wavelengths all from 350-2500nm making up column 1. – Sarah Carroll Jul 20 '15 at 04:11
@dirmen why bother with numpy, the Q is very simple. Why the complicating answer? She probably doesn't have numpy installed. – Dalen Jul 20 '15 at 04:21
@Dalen I see your point, it is nice to do everything with standard libraries, but the way I see it, if one is doing numerical work then one should use numerical python. The ```numpy.loadtxt``` function was made exactly for these purposes. In fact, most Mac compurters come with numpy pre-installed, and further it is a simple matter to install numpy, and a good exercise in itself. – dermen Jul 20 '15 at 04:27
I do have numpy and scipy as well as a bunch of other basic packages installed. I've used them in my programs to correct the jumps in spectral signal from when the director switches between the VIS and NIR sensors, as well as to make plots using the corrected data sets... I am, admittedly, a fairly novice programmer. But I'm not so green that if you throw something like numpy at me I'll break open in a panic and start oozing sap or whatever, thank you very much. – Sarah Carroll Jul 20 '15 at 04:29
1

@dirmen Yes Macs and most Linux distros too. But where did you see that Sarah has a Mac? But you are correct. numpy for numwork. I wouldn't use it because it's only hundred and something files. And I had great problems with loadtxt() before (locale problems). Well, it was my mistake, but still it took all fun from me that day. – Dalen Jul 20 '15 at 04:53
@Sarah Well, both solution's will work. numpy or without it. Now you know about the os and os.path modules. You do have nice sense of humour. I'll remember not to ooze sap and break in a panic next time someone tells me that I must do DSP *without* numpy. Imagine that torture! – Dalen Jul 20 '15 at 04:59
P.S. You should really change a little the title of the Q! :D – Dalen Jul 20 '15 at 05:00

score 0 · Answer 2 · answered Jul 20 '15 at 04:17

0

import os

dir = "./" # Your directory

lengths   = 0
responses = 0
total     = 0

for x in os.listdir(dir):
    # Check if x has *.txt extension.
    if os.path.splitext(x)[1]!=".txt": continue
    fullname = os.path.join(dir, x)
    # We don't want directories ending with *.txt to mess up our program (although in your case this is very unlikely)
    if os.path.isdir(fullpath): continue
    # Now open and read the file as binary
    file = open(fullname, "rb")
    content = file.read()
    file.close()
    # Take two entries:
    content = content.split()
    l = float(content[0])
    r = float(content[1])
    lengths += l; responses += r
    total += 1

print "Avg of lengths:", lengths/total
print "Avg of responses:", responses/total

If you want it to enter the subdirectories put it into function and make it recurse when os.path.isdir(fullname) is True.

Although I wrote you the code, SO is not for that. Mind that in your next question.

answered Jul 20 '15 at 04:17

Dalen

4,128
1
17
35

1

I didn't mean to ask anyone to write the code for me, though I certainly appreciate your response. I only wanted to know if it were possible to do with python, and,if so, what commands/modules use (in this case, for example, the os module). I'm sorry if my wording made it seem otherwise. I just hadn't come across any way to do it in the books I have and I was hoping to be pointed in the right direction. – Sarah Carroll Jul 20 '15 at 04:25
Sorry Sarah, to me sounded a little like I don't know how to, so please do it for me. Next time, don't admit you are "newish" :D Sorry again. In Python everything is possible, just give it the right equipment and it'll fly if you want it to. – Dalen Jul 20 '15 at 04:45
"If you want it to enter the subdirectories" ... then consider using `os.walk()`. This is a function in the Python standard library that lets you process subdirectories and their contents recursively. – András Aszódi Apr 05 '18 at 09:58
1

@LaryxDecidua : Certainly. But my code is easily understood without it. os.walk() takes a bit longer to explain properly. My code would require two lines added using recursion (which os.walk() does anyway). Modifying my code to use os.walk() is easy when you understand the principles. Anyway you would need two loops etc. etc. – Dalen Apr 05 '18 at 11:03
@Dalen Yes, you are right. I thought I'd mention `os.walk()` only for the benefit of those who are in the process of learning Python. – András Aszódi Apr 05 '18 at 11:53

score 0 · Answer 3 · edited May 23 '17 at 10:24

If you're on anything but Windows, a common way to do this would be to write a python program that handles all the files you put on the command line. Then you can run it on results/* to process everything, or just on a single file, or just on a few files.

This would be the more Unixy way to go about things. There are many unix programs that can handle multiple input files (cat, sort, awk, etc.), but most of them leave the directory traversal to the shell.

http://www.diveintopython.net/scripts_and_streams/command_line_arguments.html has some examples of getting at the command line args for your program.

import sys

for arg in sys.argv[1:]:  # argv[0] is the script's name; skip it
    # print arg
    sum_file(arg)  # or put the code inline here, so you don't need global variables to keep state between calls.

print "totals ..."

See also this question: What is "argv", and what does it do?

Python - processing all files in a specific folder

3 Answers3