Write first row from .txt-file as a column in new .txt-file

Question

I am trying to get the first row of a file.txt (tab separated strings) and create a new file with one column which is made of the elements of the row I want to extract. I managed to get the first row of the file with

f = open("file.txt", "r")
row1 = f.readline()

I tried the ("new_file.txt", w) after transposing with x.T but it didn't work. After I get the file I should also split in in 10 smaller files.

This is an example of the input file:

rs123  rs15  rs1567  rs43  rs567  rs3564
    1     2       3     4      5       6
    7     8       9    10     11      12

and this is what I need:

rs123
rs15
rs1567
rs43
rs567
rs3564

http://stackoverflow.com/questions/11755555/saving-numpy-array-to-txt-file-as-a-single-column — , May 20 '13 at 13:35
Please tag your question appropriately, with the language being used, as well as any relevant framework or library. — Jonathon Reinhart, May 20 '13 at 13:36
is it properly formatted now? Does your file look exactly like this? — elyase, May 20 '13 at 15:16
@user2390900, Just so that you know, atomh33ls's solution is ~10 times slower than mine on my simple test file and gets linearly worse with file size. One reason is that it has to trasverse the list 2 times, one for reading, a second for writing. It also uses a lot more memory because it loads the whole file without need which will lead to MemoryError if filesize > RAM. — elyase, May 21 '13 at 10:44
I didn't realize that you could choose only one answer... new to this website... sorry and thanks again for your help — Alice, May 21 '13 at 10:51
@elyase - can you show details of your speed/resource comparison? — Lee, May 21 '13 at 12:03
@atomh33ls, in iPython notebook just add the `%%timeit` magic at the top of the cell. I get `100 loops, best of 3: 1.82 ms per loop` in your version and `1000 loops, best of 3: 193 µs per loop` in mine with a test file similar to the one the OP shows with less that 100 lines. The resource part comes from the documentation of genfromtxt. — elyase, May 21 '13 at 12:14
@elyase - Your method performs a lot better for me as well: Yours: `100 loops best of 3 = 1.34 ms per loop`; Mine: `100 loops best of 3 = 0.574 ms per loop` — Lee, May 24 '13 at 15:09

elyase · Accepted Answer · 2013-05-20T15:34:27.940

1

with open('inFile.txt', 'r') as inFile, open('outfile.txt', 'w') as outFile:
    outFile.writelines(line + '\n' for line in inFile.readline().split('\t'))

To split the file in smaller parts I would use unix split, for example:

split -l $lines_per_file outfile.txt

To find $lines_per_file divide the total number of lines wc -l output.txt by 10.

edited May 20 '13 at 15:34

answered May 20 '13 at 14:05

elyase

39,479
12
112
119

Lee · Answer 2 · 2013-05-21T11:59:55.267

1

You could use genfromtxt and savetxt routines:

If you want to save strings (as per the amended question):

import numpy as np
with open('new_file.txt','w') as f:
   for el in np.genfromtxt('file.txt',dtype=None)[0]:
     f.write(str(el)+'\n')

If the data is numerical:

import numpy as np
x=np.genfromtxt('file.txt')[0] 
np.savetxt('new_file.txt',x)

You could even combine these into one line:

np.savetxt('myfile2.dat',np.genfromtxt('myfile.dat')[0])

edited May 21 '13 at 11:59

answered May 20 '13 at 15:01

Lee

29,398
28
117
170

Thank you, this would work if I had numbers but the output is a col of nan because I have strings. I tried to set dtype="S10" or dtype="object" but it doesn't work. – Alice May 20 '13 at 15:22
This won't work as numpy arrays must have homogenous data, only strings or only floats. – elyase May 20 '13 at 15:41
Updated to reflect the changed question. – Lee May 20 '13 at 16:33

Write first row from .txt-file as a column in new .txt-file

2 Answers2

Linked