Operations with columns from different files

Question

I have many files .txt of this type:

name1.fits 0 0 4088.9 0. 1. 0. -0.909983 0.01386 0.91 0.01386 -0.286976 0.00379 2.979 0.03971 0. 0.
name2.fits 0 0 4088.9 0. 1. 0. -0.84702 0.01239 0.847 0.01239 -0.250671 0.00261 3.174 0.04749 0. 0.
#name3.fits 0 0 4088.9 0. 1. 0. -0.494718 0.01168 0.4947 0.01168 -0.185677 0.0042 2.503 0.04365 0. 0.
#name4.fits 0 1 4088.9 0. 1. 0. -0.751382 0.01342 0.7514 0.01342 -0.202141 0.00267 3.492 0.07224 0. 0.
name4.fits 0 1 4088.961 0.01147 1.000169 0. -0.813628 0.01035 0.8135 0.01035 -0.217434 0.00196 3.515 0.04045 0. 0.

I want to divide the values of one of these columns by the values of a column from another file of the same type. Here is what I have so far:

with open('4026.txt','r') as out1, open('4089.txt', 'r') as out2, \
     open('4116.txt', 'r') as out3, open('4121.txt', 'r') as out4, \
     open('4542.txt', 'r') as out5, open('4553.txt', 'r') as out6:

    for data1 in out1.readlines():
        col1 = data1.strip().split()
        x = col1[9]

    for data2 in out2.readlines():
        col2 = data2.strip().split()
        y = col2[9]

    f = float(y) / float(x)
    print f

However I'm getting the same values for x. For example if the first set of data is 4089.txt, and the second (4026.txt) is:

name1.fits 0 0 4026.2 0. 1. 0. -0.617924 0.01749 0.6179 0.01749 -0.19384 0.00383 2.995 0.09205 0. 0.
name2.fits 0 0 4026.2 0. 1. 0. -0.644496 0.01218 0.6445 0.01218 -0.183373 0.00291 3.302 0.05261 0. 0.
#name3.fits 0 0 4026.2 0. 1. 0. -0.507311 0.01557 0.5073 0.01557 -0.176148 0.00472 2.706 0.07341 0. 0.
#name4.fits 0 1 4026.2 0. 1. 0. -0.523856 0.01086 0.5239 0.01086 -0.173477 0.00279 2.837 0.05016 0. 0.
name4.fits 0 1 4026.229 0.0144 1.014936 0. -0.619708 0.00868 0.6106 0.00855 -0.185527 0.00189 3.138 0.04441 0. 0.

and I want to divide the 9th column of each file, taking only the first elements of each column I should get 0.91/0.6179 = 1.47, but I obtain 0.958241758242.

Adib · Accepted Answer · 2016-05-03T19:28:06.323

What's happening is that the code you have is capturing the last value in the for loop and dividing that. You should conduct the division at each stage of the for-loop to get the correct divisions.

An easier approach is placing all the values in a list e.g. x = [0.0149,0.01218,..etc] and y = [...]

Then you divide the two lists using numpy (or a for-loop against the lists). Remember that they both need to be of the same size to work.

Sample code:

with open('4026.txt','r') as out1, open('4089.txt', 'r') as out2,  open('4116.txt', 'r') as out3, open('4121.txt', 'r') as out4, open('4542.txt', 'r') as out5, open('4553.txt', 'r') as out6:

    # Build two lists
    x = []
    y = []

    for data1 in out1.readlines():                
        col1 = data1.strip().split()
        x.append(col1[9])

    for data2 in out2.readlines():    
        col2 = data2.strip().split()    
        y.append(col2[9])

    for i in range(0,len(x)):

        # Make sure the denominator is not zero
        if y[i] != 0:
           print (1.0 * x[i])/y[i]
        else:
           print "Not possible"

Thanks @Adib, this is very clear. However I should note that in this case I would need `print float(y[i]) / float(x[i])`. It would be great if you could point out the way to do this with numpy. — EternalGenin, May 02 '16 at 19:47
@JVR The simplest way to handle floats is to multiply one number by 1.0. There are more ways to handle it that are possible: http://stackoverflow.com/questions/1267869/how-can-i-force-division-to-be-floating-point-in-python . — Adib, May 03 '16 at 19:29

score 0 · Answer 2 · answered May 02 '16 at 19:12

You could do it like this:

with open('4026.txt','r') as out1, open('4089.txt', 'r') as out2:
    x_col9 = [data1.strip().split()[9] for data1 in out1.readlines()]
    y_col9 = [data2.strip().split()[9] for data2 in out2.readlines()]

    if len(x_col9) != len(y_col9):
        print('Error: files do not have same number of rows')
    else:
        f = [(float(y) / float(x)) for x, y in zip(x_col9, y_col9)]
        print(f)

It may be better to process the files as shown below because it doesn't require reading the entire contents of all of them into memory first, and instead processes each one a line at a time:

    x_col9 = [data1.strip().split()[9] for data1 in out1]
    y_col9 = [data2.strip().split()[9] for data2 in out2]

Nice way to do it too. Thanks. – EternalGenin May 06 '16 at 19:36 — EternalGenin, May 06 '16 at 19:36

Operations with columns from different files

2 Answers2