0

I am trying to get the first row of a file.txt (tab separated strings) and create a new file with one column which is made of the elements of the row I want to extract. I managed to get the first row of the file with

f = open("file.txt", "r")
row1 = f.readline()

I tried the ("new_file.txt", w) after transposing with x.T but it didn't work. After I get the file I should also split in in 10 smaller files.

This is an example of the input file:

rs123  rs15  rs1567  rs43  rs567  rs3564
    1     2       3     4      5       6
    7     8       9    10     11      12

and this is what I need:

rs123
rs15
rs1567
rs43
rs567
rs3564
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Alice
  • 191
  • 2
  • 13
  • http://stackoverflow.com/questions/11755555/saving-numpy-array-to-txt-file-as-a-single-column –  May 20 '13 at 13:35
  • Please tag your question appropriately, with the language being used, as well as any relevant framework or library. – Jonathon Reinhart May 20 '13 at 13:36
  • is it properly formatted now? Does your file look exactly like this? – elyase May 20 '13 at 15:16
  • @user2390900, Just so that you know, atomh33ls's solution is ~10 times slower than mine on my simple test file and gets linearly worse with file size. One reason is that it has to trasverse the list 2 times, one for reading, a second for writing. It also uses a lot more memory because it loads the whole file without need which will lead to MemoryError if filesize > RAM. – elyase May 21 '13 at 10:44
  • I didn't realize that you could choose only one answer... new to this website... sorry and thanks again for your help – Alice May 21 '13 at 10:51
  • @elyase - can you show details of your speed/resource comparison? – Lee May 21 '13 at 12:03
  • @atomh33ls, in iPython notebook just add the `%%timeit` magic at the top of the cell. I get `100 loops, best of 3: 1.82 ms per loop` in your version and `1000 loops, best of 3: 193 µs per loop` in mine with a test file similar to the one the OP shows with less that 100 lines. The resource part comes from the documentation of genfromtxt. – elyase May 21 '13 at 12:14
  • @atomh33ls, are you getting something different? – elyase May 21 '13 at 12:20
  • @elyase - Your method performs a lot better for me as well: Yours: `100 loops best of 3 = 1.34 ms per loop`; Mine: `100 loops best of 3 = 0.574 ms per loop` – Lee May 24 '13 at 15:09

2 Answers2

1
with open('inFile.txt', 'r') as inFile, open('outfile.txt', 'w') as outFile:
    outFile.writelines(line + '\n' for line in inFile.readline().split('\t'))

To split the file in smaller parts I would use unix split, for example:

split -l $lines_per_file outfile.txt

To find $lines_per_file divide the total number of lines wc -l output.txt by 10.

elyase
  • 39,479
  • 12
  • 112
  • 119
1

You could use genfromtxt and savetxt routines:

If you want to save strings (as per the amended question):

import numpy as np
with open('new_file.txt','w') as f:
   for el in np.genfromtxt('file.txt',dtype=None)[0]:
     f.write(str(el)+'\n')

If the data is numerical:

import numpy as np
x=np.genfromtxt('file.txt')[0] 
np.savetxt('new_file.txt',x) 

You could even combine these into one line:

np.savetxt('myfile2.dat',np.genfromtxt('myfile.dat')[0])
Lee
  • 29,398
  • 28
  • 117
  • 170
  • Thank you, this would work if I had numbers but the output is a col of nan because I have strings. I tried to set dtype="S10" or dtype="object" but it doesn't work. – Alice May 20 '13 at 15:22
  • This won't work as numpy arrays must have homogenous data, only strings or only floats. – elyase May 20 '13 at 15:41
  • Updated to reflect the changed question. – Lee May 20 '13 at 16:33