3

I got a question about numpy and it's memory. Is it possible to generate a view or something out of multiple numpy arrays without copying them?

   import numpy as np


    def test_var_args(*inputData):
        dataArray = np.array(inputData)
        print np.may_share_memory(inputData, dataArray) # prints false, b.c. of no shared memory

    test_var_args(np.arange(32),np.arange(32)*2)

I've got a c++ application with images and want to do some python magic. I pass the images in rows to the python script using the c-api and want to combine them without copying them.

I am able to pass the data s.t. c++ and python share the same memory. Now I want to arange the memory to a numpy view/array or something like that.

The images in c++ are not continuously present in the memory (I slice them). The rows that I hand over to python are aranged in a continuous memory block.

The number of images I pass are varying. Maybe I can change that if there exist a preallocation trick.

Lks
  • 126
  • 5
  • you can use a `list` container to store the reference to each original array without creating a copy – Saullo G. P. Castro Aug 26 '15 at 09:32
  • 1
    If I would use a list I am not able to do all the numpy stuff I want to do. And if I convert a list to a numpy array new memory will be allocated. – Lks Aug 26 '15 at 11:26
  • 1
    `numpy.array` has a flag `copy`. It is set to `True` per default, therefore always copying your data. Have you tried using `np.array(inputData, copy=False)`? – Dux Aug 26 '15 at 14:11
  • 2
    This has been asked before. The data buffer of a `numpy` has to be contiguous. So building an array from other arrays requires copies. There is an object dtype, but it is little more than a numpy version of a list. Math across such array is limited. – hpaulj Aug 26 '15 at 23:27
  • `np.ndarray` lets you create an array from an existing buffer. – hpaulj Aug 26 '15 at 23:28
  • Possible duplicate of [Concatenate Numpy arrays without copying](https://stackoverflow.com/questions/7869095/concatenate-numpy-arrays-without-copying) – Tai Dec 19 '17 at 21:06

1 Answers1

0

There's a useful discussion in the answer here: Can memmap pandas series. What about a dataframe?

In short:

  • If you initialize your DataFrame from a single array of matrix, then it may not copy the data.
  • If you initialize from multiple arrays of the same or different types, your data will be copied.

This is the only behavior permitted by the default BlockManager used by Pandas' DataFrame, which organizes the DataFrame's memory internally.

Its possible to monkey patch the BlockManager to change this behavior though, in which case your supplied data will be referenced.

user48956
  • 14,850
  • 19
  • 93
  • 154