0

I have a loop that reads through a file until the end is reached. On each pass through the loop, I extract a 1D numpy array. I want to append this array to another numpy array in the 2D direction. That is, I might read in something of the form

x = [1,2,3]

and I want to append it to something of the form

z = [[0,0,0],
     [1,1,1]]

I know I can simply do z = numpy.append([z],[x],axis = 0) and achieve my desired result of

z = [[0,0,0],
     [1,1,1],
     [1,2,3]]

My issue comes from the fact that in the first run through the loop, I don't have anything to append to yet because first array read in is the first row of the 2D array. I dont want to have to write an if statement to handle the first case because that is ugly. If I were working with lists I could simply do z = [] before the loop and every time I read in an array, simply do z.append(x) to achieve my desired result. However I can find no way doing a similar procedure in numpy. I can create an empty numpy array, but then I can't append to it in the way I want. Can anyone help? Am I making any sense?

EDIT:

After some more research, I found another workaround that does technically do what I want although I think I will go with the solution given by @Roger Fan given that numpy appending is very slow. I'm posting it here just so its out there.

I can still define z = [] at the beginning of the loop. Then append my arrays with `np.append(z, x). This will ultimately give me something like

z = [0,0,0,1,1,1,1,2,3]

Then, because all the arrays I read in are of the same size, after the loop I can simply resize with `np.resize(n, m)' and get what I'm after.

zephyr
  • 2,182
  • 3
  • 29
  • 51

2 Answers2

3

Don't do it. Read the whole file into one array, using for example numpy.genfromtext().

With this one array, you can then loop over the rows, loop over the columns, and perform other operations using slices.

Alternatively, you can create a regular list, append a lot of arrays to that list, and in the end generate your desired array from the list using either numpy.array(list_of_arrays) or, for more control, numpy.vstack(list_of_arrays).

The idea in this second approach is "delayed array creation": find and organize your data first, and then create the desired array once, already in its final form.

heltonbiker
  • 26,657
  • 28
  • 137
  • 252
  • I have to loop till the end of the file because the file has somewhat of a complicated structure such that numpy doesn't have a generic method capable of reading it. I already have my own method of reading in the file and it is perfectly fine, I just want to implement what I've asked in my question so it is "nicer". – zephyr Sep 09 '14 at 19:25
  • Then I would suggest using the second approach in my answer. – heltonbiker Sep 09 '14 at 19:26
2

As @heltonbiker mentioned in his answer, something like np.genfromtext is going to be the best way to do this if it fits your needs. Otherwise, I suggest reading the answers to this question about appending to numpy arrays. Basically, numpy array appending is extremely slow and should be avoided whenever possible. There are two much better (and faster by about 20x) solutions:

If you know the length in advance, you can preallocate your array and assign to it.

length_of_file = 5000
results = np.empty(length_of_file)
with open('myfile.txt', 'r') as f:
    for i, line in enumerate(f):
        results[i] = processing_func(line)

Otherwise, just keep a list of lists or list of arrays and convert it to a numpy array all at once.

results = []
with open('myfile.txt', 'r') as f:
    for line in f:
        results.append(processing_func(line))
results = np.array(results)
Community
  • 1
  • 1
Roger Fan
  • 4,945
  • 31
  • 38