2

Other questions such as How to convert a mel spectrogram to log-scaled mel spectrogram have asked how to get the log-scaled mel spectrogram in python. My code below produces said spectrogram

ps = librosa.feature.melspectrogram(y=y, sr=sr)
ps_db= librosa.power_to_db(ps, ref=np.max)
librosa.display.specshow(ps_db, x_axis='s', y_axis='log')

If I plot this, I get the spectrogram I am looking for.

Librosa Display Plot

However, if I don't use librosa's display.specshow and just perform

import matplotlib.pyplot as plt
plt.imshow(ps_db)

I get this

matplotlib plot

My question is, what transformation is display.specshow doing to produce the first plot and how can I recreate this just using ps_db and numpy such that my plt.imshow() call aligns with display.specshow?

user10467738
  • 73
  • 1
  • 7
  • You need to set imshow origin to "lower": https://matplotlib.org/3.1.1/tutorials/intermediate/imshow_extent.html. You also need to set the y axis to log scale: https://stackoverflow.com/a/11543400/5058588 – Bob Baxley Jul 22 '20 at 00:09

1 Answers1

2

As suggested in the comment, you need to change the origin to lower, change the colormap to magma (I guess; could also be "plasma" or "inferno", chose here)

import matplotlib.pyplot as plt

fig, ax = plt.figure()
plt.imshow(ps_db, origin="lower", cmap=plt.get_cmap("magma"))

Regarding the logarithmic scale, as far as I can see, the data you get is already in logarithmic scale, just the ticks are wrong. If that is not the case, the you need to resample the data with a meshgrid adapted from here:

h, w = ps_db.shape
x = np.linspace(0, 2, w)
y = np.logspace(1, 8, h)
X, Y = np.meshgrid(x,y)
Dorian
  • 1,439
  • 1
  • 11
  • 26
  • Can you explain how resampling would work in this case? I have a similar problem but I can't see how this would not require re-binning the frequencies. My goal is to have a spectrogram with a log-frequency axis and directly convert it to an image without any plotting libraries. – Exitare Oct 10 '21 at 01:49
  • hi @Exitare. There is not really something like resampling of the data. If you want to present your data in logarithmic scale, the only thing you would need to do is take the logarithm of your data. this would then be considered "in logarithmic scale". – Dorian Oct 11 '21 at 13:18
  • In the case of a spectrogram, each row in the 2d spectrogram array represents a frequency bin, each column represents a time bin, and the values in the array are the amplitudes. A transformation like np.log10(spectrogram) will only apply the log to the individual amplitude values. I need to figure out a way to scale the frequency axis. This is very easy to display in a plot with yaxis="log" but I'm having difficulty figuring out how to apply an equivalent transformation to the actual data in the spectrogram array. I was hoping np.meshgrid would be the answer but I'm not too sure it is anymore. – Exitare Oct 11 '21 at 18:47