0

I've seen some previous posts that have solutions that work for others, but for some reason has not been working for me.

I'm trying to write a python script to 1) merge three files that have the same format, 2) remove duplicate headers only, 3) sorts the rows by Specimen_ID, and 4) adds 2 new blank lines between every unique Specimen_ID (i.e., every three lines, except first instance would need to be first 4 line due to headers).

I have part of a script that works for the first two and last steps:

import glob

read_files = glob.glob("*.txt")

header_saved = False
linecnt=0
with open("merged_data.txt", "wb") as outfile:
    for f in read_files:
        with open(f, "rb") as infile:
            header = next(infile)
            if not header_saved:
                outfile.write(header)
                header_saved = True
            for line in infile:
                outfile.write(line)
                linecnt=linecnt+1
                if (linecnt%3)==0:
                    outfile.write("\n\n")

Any suggestions on sorting the rows? Also, if data is exported out of Excel in tab-delimited txt files, I find that this script will only result in output containing the contents of the first infile, but not the others. If I just copy and paste the data into a new txt file and use these as infiles, I have no problems. Does anyone know why I'm experiencing this issue?

Example input file text (infile 1):

Specimen_ID Measured_by_initals Measure_date    Sex Beak_length Pronotal_width  Right_fore_femur_length Right_fore_femur_width  Left_fore_femur_length  Left_fore_femur_width   Right_hind_femur_length Right_hind_femur_width  Left_hind_femur_length  Left_hind_femur_width   Right_hind_femur_area   Left_hind_femur_area    Right_hind_tibia_width  Left_hind_tibia_width   Notes
a   1   30-Dec-16   M   4   4   4   4   4   4   4   4   4   4   4   4   4   4   
b   1   30-Dec-16   F   4   4   4   4   4   4   4   4   4   4   4   4   4   4   beak bent
c   1   30-Dec-16   M   4   4   4   4   4   4   4   4   4   4   4   4   4   4   
d   1   30-Dec-16   F   4   4   4   4   4   4   4   4   4   4   4   4   4   4   
e   1   30-Dec-16   F   4   4   4   4   4   4   4   4   4   4   4   4   4   4   pronotum deformed
f   1   30-Dec-16   F   4   4   4   4   4   4   4   4   4   4   4   4   4   4   

Example input file text (infile 2):

Specimen_ID Measured_by_initals Measure_date    Sex Beak_length Pronotal_width  Right_fore_femur_length Right_fore_femur_width  Left_fore_femur_length  Left_fore_femur_width   Right_hind_femur_length Right_hind_femur_width  Left_hind_femur_length  Left_hind_femur_width   Right_hind_femur_area   Left_hind_femur_area    Right_hind_tibia_width  Left_hind_tibia_width   Notes
a   2   30-Dec-16   M   4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 
b   2   30-Dec-16   F   4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 
c   2   30-Dec-16   M   4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 
d   2   30-Dec-16   F   4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 
e   2   30-Dec-16   F   4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 
f   2   30-Dec-16   F   4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 
Mike F
  • 111
  • 7

1 Answers1

0

your solution should be working perfect, unless there is some unexpected data in files. I just added the code for your 3rd item

read_files = glob.glob("*.txt")

header_saved = False
linecnt=0
with open("merged_data.txt", "wb") as outfile:
    for f in read_files:
        with open(f, "rb") as infile:
            header = next(infile)
            if not header_saved:
                outfile.write(header)
                header_saved = True
            for line in infile:
                outfile.write(line)
                linecnt=linecnt+1
                if (linecnt%3)==0:
                    outfile.write("\n\n")

inputfile1.txt

Employee,Account,Currency,Amount,Location
Test 1,  Basic,USD,3000,Airport
Test 2,  Net, USD,2000,Airport
Test 3,  Basic,USD,4000,Town
Test 4,  Net, USD,3000,Town
Test 5,  Basic,GBP,5000,Town
Test 6,  Net, GBP,4000,Town

inputfile2.txt

Employee,Account,Currency,Amount,Location
Test 8,  Basic,USD,3000,Airport
Test 9,  Net, USD,2000,Airport
Test 10,  Basic,USD,4000,Town
Test 11,  Net, USD,3000,Town
Test 12,  Basic,GBP,5000,Town
Test 13,  Net, GBP,4000,Town

output

Employee,Account,Currency,Amount,Location
Test 1,  Basic,USD,3000,Airport
Test 2,  Net, USD,2000,Airport
Test 3,  Basic,USD,4000,Town


Test 4,  Net, USD,3000,Town
Test 5,  Basic,GBP,5000,Town
Test 6,  Net, GBP,4000,Town

Test 8,  Basic,USD,3000,Airport
Test 9,  Net, USD,2000,Airport
Test 10,  Basic,USD,4000,Town


Test 11,  Net, USD,3000,Town
Test 12,  Basic,GBP,5000,Town
Test 13,  Net, GBP,4000,Town
Shijo
  • 9,313
  • 3
  • 19
  • 31
  • I tried your version of the script, but it resulted in the same incorrect output that I've been generating, i.e., an outfile that includes only the contents of the first infile. Also, two new black lines are not inserted between any rows. – Mike F Dec 30 '16 at 15:33
  • I just executed the code with your sample files, It is working fine. can you check the extension of second file on your machine. it may not be *.txt – Shijo Dec 30 '16 at 15:44
  • Hmmm, that's odd. The files do have .txt extensions and contain content. Not sure why it's working correctly for you but not for me. – Mike F Dec 30 '16 at 16:02