4

required_time_stamps contains 5911 time stamps
time_based_mfcc_feature contains 5911 samples each having 20 mfcc features.

So if you were to look at time_based_mfcc_feature
it will look like :

row1    val2 val3  ... val 20  
row2    val2 val3  ... val 20  
row3    val2 val3  ... val 20
.  
.  
.  
row5911  val2 val3  ... val 20  


print type(required_time_stamps)  

< type 'numpy.ndarray'>

print required_time_stamps.shape  

(5911,)

print type(time_based_mfcc_feature)

< type 'numpy.ndarray'>

print time_based_mfcc_feature.shape  

(5911, 20)

I want to combine these two so that I will have :

In R, I can simply do

time_based_mfcc_feature<-as.data.frame(time_based_mfcc_feature) 
required_time_stamps<-as.data.frame(required_time_stamps)  

new_dataframe <- merge(required_time_stamps,time_based_mfcc_feature)  
View(new_dataframe)

How would I do achieve this in python ?

So that the final data would look like this :

time1   row1    val2 val3  ... val 20  
time2   row2    val2 val3  ... val 20  
time3   row3    val2 val3  ... val 20
.  
.  
.  
time5911 row5911  val2 val3  ... val 20    

Where these time1 to time 5911 are simply the values contained in the required_time_stamps.
I tried :

mfcc_features_with_times= np.hstack((required_time_stamps,time_based_mfcc_feature))

BUT GOT THIS ERROR

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-41-ce462d805743> in <module>()
----> 1 mfcc_features_with_times= np.hstack((required_time_stamps,time_based_mfcc_feature))

/usr/local/lib/python2.7/dist-packages/numpy/core/shape_base.pyc in hstack(tup)
    289     # As a special case, dimension 0 of 1-dimensional arrays is "horizontal"
    290     if arrs and arrs[0].ndim == 1:
--> 291         return _nx.concatenate(arrs, 0)
    292     else:
    293         return _nx.concatenate(arrs, 1)

ValueError: all the input arrays must have same number of dimensions

THEN I TRIED TRANSPOSE :

t = required_time_stamps.transpose  
mfcc_features_with_times= np.hstack((t,time_based_mfcc_feature))  

But again same error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-43-47cddb391d3f> in <module>()
----> 1 mfcc_features_with_times= np.hstack((t,time_based_mfcc_feature))

/usr/local/lib/python2.7/dist-packages/numpy/core/shape_base.pyc in hstack(tup)
    289     # As a special case, dimension 0 of 1-dimensional arrays is "horizontal"
    290     if arrs and arrs[0].ndim == 1:
--> 291         return _nx.concatenate(arrs, 0)
    292     else:
    293         return _nx.concatenate(arrs, 1)

ValueError: all the input arrays must have same number of dimensions

I also looked at : Numpy concatenate 2D arrays with 1D array but I think it is something else.

Goal is to feed this data to a keras neural network, row by row.
I also have 5911 labels corresponding to the 5911 time stamps, which I will concatenate later.

UPDATE: Based on the links in comments I tried,

>>> a = np.array([[1,2,3], [2,3,4]])
>>> a
array([[1, 2, 3],
       [2, 3, 4]])
>>> b = np.array([[1,2,3,0], [2,3,4,0]])
>>> b
array([[1, 2, 3, 0],
       [2, 3, 4, 0]])
>>> c= np.hstack((a,b))
>>> c
array([[1, 2, 3, 1, 2, 3, 0],
       [2, 3, 4, 2, 3, 4, 0]])

For this example the stacking works , but no clue why the same logic is not working for me.

UPDATE : I was able to solve by follwing cmaher's suggestion:

mfcc_features_with_times= np.hstack((required_time_stamps[:,None],time_based_mfcc_feature))

however this is true only if both have same dimension. In most cases I am ending up with Array A having shape (8400,) and Array B having shape (8399, 21).

How do I truncate/delete the last few rows of A so that both A and B have same shapes like (8399,) and (8399, 21) . Please advise.

UPDATE ERROR WHILE SLICINg: Currently When I do A = A[:B.shape[0],:] where A = new_labels_np_array B = time_based_mfcc_feature

` 64     if len(new_labels_np_array) > len(time_based_mfcc_feature):
---> 65         new_labels_np_array = new_labels_np_array[:time_based_mfcc_feature.shape[0],:]
     66     elif len(time_based_mfcc_feature)>len(new_labels_np_array):
     67         time_based_mfcc_feature = time_based_mfcc_feature[:,new_labels_np_array.shape[0],:]

IndexError: too many indices for array`
kRazzy R
  • 1,561
  • 1
  • 16
  • 44

1 Answers1

3

Since you've found already an answer for the first part of your question in the thread numpy-concatenate-2d-arrays-with-1d-array, I'll address the second question:

How do I truncate/delete the last few rows of A so that both A and B have same shapes like (8399,) and (8399, 21) . Please advise.

You can slice a numpy array like you would slice a list. So to trim a 2D-array B to the size of A along axis 0.

B = B[:A.shape[0],:]

This trims the end of the array. If you want to trim at the beginning, i.e. throw away the first few rows that don't fit into shape instead of the last:

B = B[-A.shape[0]:,:]

EDIT: Your comment implies that you don't know in advance which of the arrays is longer. In that case:

trim = min(A.shape[0], B.shape[0])
A = A[:trim]
B = B[:trim,:] 

or respectively

trim = min(A.shape[0], B.shape[0])
A = A[-trim:]
B = B[-trim:,:]
ascripter
  • 5,665
  • 12
  • 45
  • 68
  • would it make a difference if I store data as numpy array or pandas dataframe ? I'm going to feed it to a cnn. – kRazzy R Feb 09 '18 at 00:56
  • could you kindly help me out here : https://stackoverflow.com/questions/48698167/unable-to-do-np-savetxt-on-save-newly-appended-string-column-to-numpy-array – kRazzy R Feb 09 '18 at 03:34
  • 1
    I'm not that experienced with pandas, but afaik pd.DataFrame is just a fancy container for numpy arrays. I'm not really familiar with the indexing syntax however – ascripter Feb 09 '18 at 10:44
  • what is meant by `If you want to align not at the first but at the last row:` if A is (6,) and B is (5,21) I want A to become (5,) i.e deleting the last value. So that A, and B have same shape. similarly, if A is (5,) and B is (6,21) B should be made into (5,21), i.e deleting last row of B – kRazzy R Feb 09 '18 at 17:22
  • basically i'm using your code from here : https://stackoverflow.com/a/48466007/4932791 , I want to combine mfcc features with the corresponding labels. So for stacking the two arrays have to be of the same size. – kRazzy R Feb 09 '18 at 17:28
  • Currently When I do `A = A[:B.shape[0],:]` where `A = new_labels_np_array` `B = time_based_mfcc_feature` IndexError: too many indices for array` Please see update in question. thanks you. – kRazzy R Feb 09 '18 at 17:45
  • 1
    I've edited my answer and also exchanged A and B - I think that's the reason for your IndexError. – ascripter Feb 10 '18 at 09:54