0

I have a question that is similar (but not the same) to this one: The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (33,) + inhomogeneous part

When I run this code:

#Creating spectrograms
import librosa
def prepare_audio(audio_path):
  list_matrices = []
  y,sr = librosa.load(audio_path,sr=22050)
  D = np.abs(librosa.stft(y))**2
  S = librosa.feature.melspectrogram(S=D, sr=sr)
  list_matrices.append(S)
  return list_matrices

all_X_data = []
targets = []

#Loading files into spectrogram and putting the result in a matrix 
from pathlib import Path
path = '/Users/xyz/Desktop/Audio_Samplepack'
pathlist = Path(path).glob('**/*.wav')
for path in pathlist:
     path_in_str = str(path)
     audio = prepare_audio(path_in_str)
     all_X_data += audio
     targets += ([0]*len(audio))

#Train_test_split with above data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(np.array(all_X_data), 
                                                    np.array(targets),
                                                    test_size=0.33,
                                                    random_state=42)

For using np.array(all_X_data) I get the same error as in the above mentioned question, that is: ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (128,182) + inhomogeneous part.

Which is totally correct, as my matrix all_X_data should have 182 lines with 128 columns. But the error suggests that there are elements inside all_X_data that are shorter or longer than 128 which surprises me tbh as they are all preprocessed by the librosa.melspectrogram function, so I thought they'd always all be 128 in width by default.

I checked their length individually with:

for element in all_X_data:
    print(len(element))

But the result was 128 for all the 182 elements inside all_X_data: 128 128 128 . . . 128

I thought about an automatic solution to this so I added this code afterwards:

print(len(all_X_data))
for element in all_X_data:
    if len(element) != 128:
        all_X_data.remove(element)    
print(len(all_X_data))

Resulting in: 182 182

Which suggests that the code didn't remove any of the 182 elements and that all of them should be 128 in width. So that actually confuses me as the error claims that there are elements inside the matrix that do not actually match the width of 128 ... Does anyone know what could be the problem here?

I've read questions like this one: Is there a simple way to delete a list element by value?

So don't get me wrong, I am not asking you how to detect or even remove elements of a list - the question here is why can't I find any of those elements that ought to be removed? Where is the inhomogeneous part? I can't find it

Laulito
  • 1
  • 1
  • 1
    it would be helpful if you posted the **full error message including the stack trace** – juanpa.arrivillaga Aug 10 '23 at 16:34
  • BTW, `for element in all_X_data: if len(element) != 128: all_X_data.remove(element)` is bugged. don't modify a list as you iterate over it – juanpa.arrivillaga Aug 10 '23 at 16:36
  • 1
    The inhomogeneous part is *after* the dimensions you're checking. (Also, it sounds like you might have mixed up which was the 128 dimension and which was the 182.) – user2357112 Aug 10 '23 at 16:37
  • @user2357112 yes you are right, I'm sorry for the mixup! – Laulito Aug 10 '23 at 17:07
  • @juanpa.arrivillaga The full error message is: ... Python Audio/GenreKlassifizierer.py" 182 182 Traceback (most recent call last): File "...Project Python Audio/GenreKlassifizierer.py", line 97, in X_train, X_test, y_train, y_test = train_test_split(np.array(all_X_data), ^^^^^^^^^^^^^^^^^^^^ ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (182, 128) + inhomogeneous part. – Laulito Aug 10 '23 at 17:11
  • 1
    @Laulito **don't post that as a comment**. Post it in the question itself as formatted text – juanpa.arrivillaga Aug 10 '23 at 17:16
  • Anyway, what does `import collections; print(collections.Counter(x.shape for x in all_X_data))` give you? You are only ever checking the first dimension with `len` – juanpa.arrivillaga Aug 10 '23 at 17:18
  • @juanpa.arrivillaga Well that explains a lot ... now I get: Counter({(128, 80): 32, (128, 160): 27, (128, 148): 16, (128, 74): 8, (128, 61): 7, (128, 122): 7, (128, 126): 6, (128, 63): 5, (128, 130): 5, (128, 129): 5, (128, 134): 5, (128, 65): 5, (128,... and so on}) So seems like that is bad ... have you got an idea on how to format the matrix in a good way? So that every row has the same amount of columns always? What also would be interesting to know is how to delete or rearrange elements inside of the matrix? For example to fill up with zeros if columns missing or to delete rows? – Laulito Aug 10 '23 at 17:34
  • This inhomogeneous error is the latest way of handling "ragged arrays'. Originally numpy automatically made an `object` dtype (except for certain mixes of shapes). Then it gave a 'ragged' warning. And now gives this error unless you speicify `object` dtype. Anyways, you have to somehow find the differing shape(s). – hpaulj Aug 10 '23 at 17:36
  • Yes you are right, I need to close this question. I have to find a solution to generalise the "matrix-filling-process". I need to work on it by myself more profoundly. I'm sorry for the mixup folks. – Laulito Aug 10 '23 at 17:43
  • I think with hstack the function train_test_split would not be initialised properly but I'm not sure ... Resize could be interesting, as it offers the possibility to cut all the spectrograms after a certain amount of time, maintaining most of the information ... In general I want to prepare audio samples for a MLP or a CNN, it could be anything really for example samples that have the same musical scale or samples that are being played by the same instrument – Laulito Aug 10 '23 at 17:55
  • I suspect you need to study the `librosa` docs to better understand how to prepare audio samples and the spectrograms for ml processing. – hpaulj Aug 10 '23 at 19:58
  • the matrix is getting filled with spectrograms, which are like a Fourier-transformations of the time-signal - over time. 128 is the exact height of every image (spectrogram), because every sample gets captured in the same frequency range (20Hz - 20kHz) and the librosa function spectrogram "norms" all these images with an height of 128, but the width of these images is differs a lot, as all the samples have different durations. Might be an idea to cut after a few seconds, or to norm the length and fill with zeros if shorter, or fill with the reversed sequence of the sampled audio file tensors – Laulito Aug 10 '23 at 20:12

0 Answers0