0

Let's say I have data for 3 variable pairs, A, B, and C (in my actual application the number of variables is anywhere from 1000-3000 but could be even higher).

Let's also say that there are pieces of the data that come in arrays.

For example:

Array X:

np.array([[  0.,   2.,   3.],
        [ -2.,   0.,   4.],
        [ -3.,  -4.,   0.]])

Where:

X[0,0] = corresponds to data for variables A and A
X[0,1] = corresponds to data for variables A and B
X[0,2] = corresponds to data for variables A and C
X[1,0] = corresponds to data for variables B and A
X[1,1] = corresponds to data for variables B and B
X[1,2] = corresponds to data for variables B and C
X[2,0] = corresponds to data for variables C and A
X[2,1] = corresponds to data for variables C and B
X[2,2] = corresponds to data for variables C and C

Array Y:

np.array([[2,12],
[-12, 2]])

Y[0,0] = corresponds to data for variables A and C
Y[0,1] = corresponds to data for variables A and B
Y[1,0] = corresponds to data for variables B and A
Y[1,1] = corresponds to data for variables C and A

Array Z:

np.array([[ 99,  77],
       [-77, -99]])

Z[0,0] = corresponds to data for variables A and C
Z[0,1] = corresponds to data for variables B and C
Z[1,0] = corresponds to data for variables C and B
Z[1,1] = corresponds to data for variables C and A

I want to concatenate the above arrays keeping the variable position fixed as follows:

END_RESULT_ARRAY index 0 corresponds to variable A
END_RESULT_ARRAY index 1 corresponds to variable B
END_RESULT_ARRAY index 2 corresponds to variable C

Basically, there are N variables in the universe but can change every month (new ones can be introduced and existing ones can drop out and then return or never return). Within the N variables in the universe I compute permutations pairs and the positioning of each variable is fixed i.e. index 0 corresponds to variable A, index = 1 corresponds to variable B (as described above).

Given the above requirement the end END_RESULT_ARRAY should look like the following:

array([[[  0.,   2.,   3.],
        [ -2.,   0.,   4.],
        [ -3.,  -4.,   0.]],

       [[ nan,  12.,   2.],
        [-12.,  nan,  nan],
        [  2.,  nan,  nan]],

       [[ nan,  nan,  99.],
        [ nan,  nan,  77.],
        [-99., -77.,  nan]]])

Keep in mind that the above is an illustration.

In my actual application, I have about 125 arrays and a new one is generated every month. Each monthly array may have different sizes and may only have data for a portion of the variables defined in my universe. Also, as new arrays are created each month there is no way of knowing what its size will be or which variables will have data (or which ones will be missing).

So up until the most recent monthly array, we can determine the max size from the available historical data. Each month we will have to re-check the max size of all the arrays as a new array comes available. Once we have the max size we can then re-stitch/concatenate all the arrays together IF THIS IS SOMETHING THAT IS DOABLE in numpy. This will be an on-going operation done every month.

I want a general mechanism to be able to stitch these arrays together keeping the requirements I describe regarding the index position for the variables fixed.

I actually want to use H5PY arrays as my data set will grow exponentially not too distant future. However, I would like to get this working with numpy as a first step.

letsintegreat
  • 3,328
  • 4
  • 18
  • 39
codingknob
  • 11,108
  • 25
  • 89
  • 126
  • Can you share some code that we can use to create the array instead of what you've provided. It's very unclear what shapes you are actually working with. – user3483203 Oct 22 '19 at 18:55
  • Try something like `out = np.full_like(a, np.nan); i, j = b.shape; out[:i, :j] = b` – user3483203 Oct 22 '19 at 19:08
  • `out=np.stack((x,np.array([[np.nan,y[0,1],y[0,0]],[y[1,0],np.nan,np.nan],[y[1,1],np.nan,np.nan]]),np.array([[np.nan, np.nan,z[0,0]],[np.nan,np.nan,z[0,1]],[z[1,1],z[1,0],np.nan]])),0)` – Ruzihm Oct 22 '19 at 20:55
  • the issue is your solution there is specific to the example I have provided. In my actual problem I have 125 arrays and incrementing by 1 every month. Each array may have different sizes and position issues and as new arrays are generated each month there is no way of knowing which index positions will have values. What is certain though is that I need index 0 = variable A, index 1 = variable B and so on. Given that requirement the end goal is the stitch together all arrays together keeping index position fixed for the variables. – codingknob Oct 22 '19 at 21:02
  • Are there always 3 variables? Is the first array always 3x3 and the rest of the arrays always 2x2? It's impossible to tell from the question what you are really trying to ask. There's little distinction made between what's simply an example and what's a categorical statement about your problem. – Ruzihm Oct 22 '19 at 21:11
  • all the arrays can be of different size. So up until the most recent monthly array we can determine the max size from the available historical data. Each month we will have to re-check the max size as a new array comes available. Once we have the max size we can then re-stitch/concatenate all the arrays together. This will be an on-going operation done every month – codingknob Oct 22 '19 at 21:17
  • Also there's no explanation for why the indices for two identically shaped arrays mean different things. Does the order of the arrays have some kind of significance? Are the variables ordered in some way? – Ruzihm Oct 22 '19 at 21:17
  • basically there are N variables that can change every month (new ones can be introduced and existing ones can drop out and then return or never return). Within the N variables I compute permutations pairs and those pairs ordering is fixed. – codingknob Oct 22 '19 at 21:20
  • I downvoted it because the goal indices for each element of the input arrays have no explanation, just a statement of what they would be under some example, and you made no effort when I asked you to clarify: "Also there's no explanation for why the indices for two identically shaped arrays mean different things. Does the order of the arrays have some kind of significance? Are the variables ordered in some way?" – Ruzihm Oct 22 '19 at 21:46
  • This honestly seems like an [XY problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). Instead of trying to interpret incomprehensible arrays, it would probably be more straightforward to change the output to be more useful. – Ruzihm Oct 22 '19 at 22:38
  • I know the end result needs to be a H5py/Numpy array but maybe there are intermediate steps that can be done outside of H5py/Numpy and imported into arrays once shaped appropriately. Let me sleep on it. – codingknob 10 mins ago Delete – codingknob Oct 22 '19 at 23:23
  • 1
    the problem has been solved by using pandas/numpy transpose and reshape - https://stackoverflow.com/questions/58531295/convert-pandas-series-and-dataframe-objects-to-a-numpy-array – codingknob Oct 24 '19 at 00:12
  • 1
    yeah, using consistent string indices makes this a lot more tractable than inconsistent integer indices. good work – Ruzihm Oct 24 '19 at 02:55

2 Answers2

2

Based on the comment made by @user3483203. The next step is to concatenate the arrays.

a = np.array([[  0.,   2.,   3.],
        [ -2.,   0.,   4.],
        [ -3.,  -4.,   0.]])

b = np.array([[0,12], [-12, 0]])


out = np.full_like(a, np.nan); i, j = b.shape;  out[:i, :j] = b

res = np.array([a, out])
print (res)
1

This answers the original question which has since been changed:

Lets say I have the following arrays:

np.array([[  0.,   2.,   3.],
        [ -2.,   0.,   4.],
        [ -3.,  -4.,   0.]])


np.array([[0,12],
[-12, 0]])

I want to concatenate the above 2 arrays such that the end result is as follows:

array([[[0, 2, 3], 
[-2, 0, 4],
[-3,-4, 0]],

[[0,12, np.nan],
[-12, 0, np.nan],
[np.nan, np.nan, np.nan]]])

Find out how much each array exceeds the max size in each dimension, then use np.pad to pad at the end of each dimension, then finally np.stack to stack them together:

import numpy as np
a = np.arange(12).reshape(4,3).astype(np.float)
b = np.arange(4).reshape(1,4).astype(np.float)

arrs = (a,b)
dims = len(arrs[0].shape)

maxshape = tuple( max(( x.shape[i] for x in arrs)) for i in range(dims))

paddedarrs = ( np.pad(x, tuple((0, maxshape[i]-x.shape[i]) for i in range(dims)), 'constant', constant_values=(np.   nan,)) for x in (a,b))

c = np.stack(paddedarrs,0)

print (a)
print(b,"\n======================")
print(c)
[[ 0.  1.  2.]
 [ 3.  4.  5.]
 [ 6.  7.  8.]
 [ 9. 10. 11.]]
[[0. 1. 2. 3.]]
======================
[[[ 0.  1.  2. nan]
  [ 3.  4.  5. nan]
  [ 6.  7.  8. nan]
  [ 9. 10. 11. nan]]

 [[ 0.  1.  2.  3.]
  [nan nan nan nan]
  [nan nan nan nan]
  [nan nan nan nan]]]
Ruzihm
  • 19,749
  • 5
  • 36
  • 48
  • Looks like the question of how to pad an array made of differently sized arrays is solved isn't it? As for how to intermingle the indices of those arrays, that would be a different question, consider posting it separately. – Ruzihm Oct 22 '19 at 20:17
  • but is it also solved for appropriately padding arrays with different position requirements as per my updated question? I was having a difficult time expressing my question but my latest updated question should explain this more carefully – codingknob Oct 22 '19 at 20:40