Opening and performing the same operations on multiple txt files

Question

I have about 50 text files that I would like to open and then perform a few operations on and then save the output to a new file. So for just 1 of these text files this code does what I want:

#open file
df=pd.read_csv(r'F:\Sheyenne\Statistics\NDVI_allotment\Text\A_Annex.txt', sep='\t', nrows=80, skiprows=2)

#replace value names in 'Basic Stats'
df=df.replace({'Band 80$': 'LT50300281984137PAC00',
'Band 79$': 'LT50300281984185XXX15',
'Band 78$': 'LT50300821984249XXX03',
'Band 77$': 'LT50300281985139PAC12',
'Band 76$': 'LT50300281985171PAC04',
'Band 75$': 'LT50300281986206XXX03',
'Band 74$': 'LT50300281986238XXX03',
'Band 73$': 'LT50300281987241XXX04',
'Band 72$': 'LT50300281987257XXX03',
'Band 71$': 'LT50300281987273XXX05',
'Band 70$': 'LT50300281988212XXX03'}, regex=True)

#take a slice of the data
df['Basic Stats']=df['Basic Stats'].str.slice(13,20)

#sort the data
df=df.sort(columns='Basic Stats', axis=0, ascending=True)

I need to do these exact same operations on all 50 files, is there a way to do this in pandas? Even non-pandas answers will be helpful though.

Edit:

A snippet of what the first 1000 characters of the file is:

'Filename: F:\\Sheyenne\\Atmospherically Corrected Landsat\\Indices\\Main\\NDVI\\NDVI_stack\nROI: EVF: Layer: Main_allotments.shp (allotment1=A. Annex) [White] 3984 points\n\nBasic Stats\t      Min\t     Max\t    Mean\t   Stdev\t  Num\tEigenvalue\n     Band 1\t 0.428944\t0.843916\t0.689923\t0.052534\t    1\t  0.229509\n     Band 2\t-0.000000\t0.689320\t0.513170\t0.048885\t    2\t  0.119217\n     Band 3\t 0.336438\t0.743478\t0.592622\t0.052544\t    3\t  0.059111\n     Band 4\t 0.313259\t0.678561\t0.525667\t0.048047\t    4\t  0.051338\n     Band 5\t 0.374522\t0.746828\t0.583513\t0.055989\t    5\t  0.027913\n     Band 6\t-0.000000\t0.749325\t0.330068\t0.314351\t    6\t  0.022561\n     Band 7\t-0.000000\t0.819288\t0.600136\t0.170060\t    7\t  0.018126\n     Band 8\t-0.000000\t0.687823\t0.450559\t0.084678\t    8\t  0.012942\n     Band 9\t 0.332637\t0.776398\t0.549870\t0.085212\t    9\t  0.009261\n    Band 10\t 0.386589\t0.848977\t0.635024\t0.087712\t   10\t  0.006628\n    Band 11\t 0.265165\t0.822361\t0.594286\t0.075730\t   11\t  0.004517\n    Band 12\t 0.191882\t0.539559\t0.343836\t0.0'

EDIT:

This code:

d={'Band 80$': 'LT50300281984137PAC00',
'Band 79$': 'LT50300281984185XXX15',
'Band 78$': 'LT50300821984249XXX03',
'Band 77$': 'LT50300281985139PAC12',
'Band 76$': 'LT50300281985171PAC04',
'Band 75$': 'LT50300281986206XXX03',
'Band 74$': 'LT50300281986238XXX03',
'Band 73$': 'LT50300281987241XXX04',
'Band 72$': 'LT50300281987257XXX03',
'Band 71$': 'LT50300281987273XXX05',
'Band 70$': 'LT50300281988212XXX03'}

pth = r'F:\Sheyenne\Statistics\NDVI_allotment\Text' # path to files
new = os.path.join(pth,"new") 
os.mkdir(new)  # create new dir for new files
os.chdir(new) # change to that directory
# loop over each file and update
for f in os.listdir(pth):
    df = pd.read_csv(os.path.join(pth, f), sep='\t', nrows=80, skiprows=2)
    df = df.replace(d)
    df['Basic Stats'] = df['Basic Stats'].str.slice(13,20)
    df.sort(columns='Basic Stats', axis=0, ascending=True, inplace=True)
    # save data to csv
    df.to_csv(os.path.join(new, "new_{}".format(f)), index=False, sep="\t")
print 'Done Processing'

returns:

IOError: Initializing from file failed

I am very unfamiliar with either of those. Do you have some links that I can look into? — Stefano Potter, Aug 25 '15 at 21:25
add a snippet of what your actual input files look like, also you want to sort by the first column? — Padraic Cunningham, Aug 25 '15 at 21:25
I added an edit, not sure if that is what you were looking for, and yes I am sorting by first column — Stefano Potter, Aug 25 '15 at 21:28
So basically replacing Band X with the corresponding values? Also are all the files in a dir of their own? — Padraic Cunningham, Aug 25 '15 at 21:31

score 1 · Answer 1 · edited May 23 '17 at 12:22

1

I'd wrap what you have in a function, and make the filename a parameter to the function. Then you can just call the function in a loop to process each file. This isn't panda-specific, but it should work.

If all the files to be processed are in one directory, you can use this answer to get a list of the files.

from os import listdir
from os.path import isfile, join

mypath = 'the directory name here'
filenames = [ f for f in listdir(mypath) if isfile(join(mypath,f)) ]

def process_file(filename):
    df=pd.read_csv(filename, sep='\t', nrows=80, skiprows=2)
    # Rest of code goes here...

for filename in filenames:
    process_file(filename)

edited May 23 '17 at 12:22

Community

1
1

answered Aug 25 '15 at 21:27

Jeremy Sigrist

325
2
11

is there a way to not manually add all 50 file pathways and still use this? – Stefano Potter Aug 25 '15 at 21:29
Do you have a list of the files somewhere, like a text file or something? Or are they all in a specific directory? – Jeremy Sigrist Aug 25 '15 at 21:31
They are all in 1 folder. I can easily create a list of them in notepad++ or something too – Stefano Potter Aug 25 '15 at 21:32
also, when using this with just a couple of the files entered in manually it is not recognizing `df` – Stefano Potter Aug 25 '15 at 21:34
One directory is fine. Do you want to process all the files in the directory or exclude some? – Jeremy Sigrist Aug 25 '15 at 21:35
all files in directory – Stefano Potter Aug 25 '15 at 21:36
What is the specific error message you're getting? Did you just copy all your code in the answer into the function? – Jeremy Sigrist Aug 25 '15 at 21:38
Yea where #Rest of code goes here I just pasted what I have above in. It says local variable df is assigned but never used for first one and undefined name 'df' where I have `df=df.replace` – Stefano Potter Aug 25 '15 at 21:40
Is all the code indented to match the df=pd.read_csv(...) line? – Jeremy Sigrist Aug 25 '15 at 21:41
I edited the answer to include getting a list of files in a directory. – Jeremy Sigrist Aug 25 '15 at 21:42
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/87932/discussion-between-jeremy-sigrist-and-stefano-potter). – Jeremy Sigrist Aug 25 '15 at 21:43

Padraic Cunningham · Accepted Answer · 2015-08-25T22:50:51.687

1

 d = {'Basic Stats':{'Band 80$': 'LT50300281984137PAC00',
 'Band 79': 'LT50300281984185XXX15',
 'Band 78': 'LT50300821984249XXX03',
 'Band 77': 'LT50300281985139PAC12',
 'Band 76': 'LT50300281985171PAC04',
 'Band 75': 'LT50300281986206XXX03',
 'Band 74': 'LT50300281986238XXX03',
 'Band 73': 'LT50300281987241XXX04',
 'Band 71': 'LT50300281987273XXX05',
 'Band 70': 'LT50300281988212XXX03'}}


pth = r'F:\Sheyenne\Statistics\NDVI_allotment\Text' # path to files
new = os.path.join(pth,"new") 
os.mkdir(new)  # create new dir for new files
# loop over each file and update
for f in os.listdir(pth):
    df = pd.read_csv(os.path.join(pth, f), sep='\t', nrows=80, skiprows=2)
    df = df.replace(d)
    df['Basic Stats'] = df['Basic Stats'].str.slice(13,20)
    df.sort(columns='Basic Stats', axis=0, ascending=True, inplace=True)
    # save data to csv
    df.to_csv(os.path.join(new, "new_{}".format(f)), index=False, sep="\t")

One part that does not make sense is replacing with the values from the dict and then slicing some of the string away, it would make more sense to use the correct values to start with. Another issue is if df['Basic Stats'] = df['Basic Stats'].str.slice(13,20) matches nothing then slicing from 13:20 will leave you with an empty string so you should make sure that there will definitely be a match for each row or you will end up losing data

edited Aug 25 '15 at 22:50

answered Aug 25 '15 at 21:40

Padraic Cunningham

176,452
29
245
321

I got this error, `WindowsError: [Error 3] The system cannot find the path specified: 'path/*.*'` – Stefano Potter Aug 25 '15 at 21:51
you need to pass the path to the files, `"path"` is just a placeholder, try the edit presuming all the files are in `r'F:\Sheyenne\Statistics\NDVI_allotment\Text\` – Padraic Cunningham Aug 25 '15 at 21:54
youll have to excuse me, I'm new to manipulating things through the os, but should't `"path"` be the same directory as `pth`? – Stefano Potter Aug 25 '15 at 21:58
ok yea thats what I did, now it returns, `AttributeError: 'NoneType' object has no attribute 'to_csv'` – Stefano Potter Aug 25 '15 at 21:59
no idea how you could be getting a NoneType error unless you are doing `df = df.sort(columns='Basic Stats', axis=0, ascending=True, inplace=True)`, I used `inplace=True` so you sort the dataframe inplace, that will return None so assigning df to it will make df point to None – Padraic Cunningham Aug 25 '15 at 22:17
If I remove the `df.save` line in the loop this error is returned, `IOError: Initializing from file failed`, and I used the sorting method you provided – Stefano Potter Aug 25 '15 at 22:19
add a `print os.getcwd()` in the loop and show me the output – Padraic Cunningham Aug 25 '15 at 22:22
`F:\Sheyenne\Statistics\NDVI_allotment\Text\new`, and its repeated about 53 times which is the number of txt files I have – Stefano Potter Aug 25 '15 at 22:25
@StefanoPotter, made a mistake, code works correctly now – Padraic Cunningham Aug 25 '15 at 22:38
it still returns `AttributeError: 'NoneType' object has no attribute 'to_csv` – Stefano Potter Aug 25 '15 at 23:00
You are not using the code I provided, you have `df = df.sort(columns='Basic Stats', axis=0, ascending=True, inplace=True)`, look at my code and see the difference – Padraic Cunningham Aug 25 '15 at 23:01
Ugh, I could have SWORN I removed that before, guess not. It returns this `IOError: Initializing from file failed` – Stefano Potter Aug 25 '15 at 23:04
Code runs for me fine, must be a widows specific problem, maybe because you are reading from the F partition – Padraic Cunningham Aug 25 '15 at 23:13
The files populate actually even though error is returned, and Ill change it so slice is not needed anymore. Thank you! – Stefano Potter Aug 25 '15 at 23:14
No worries. I don't use windoze so not totally sure what the error/warning is related to, I have not come across it before. – Padraic Cunningham Aug 25 '15 at 23:20

Opening and performing the same operations on multiple txt files

2 Answers2