python wavelet spectrogram plotting - how to obtain correct visualization?

Question

Trying to explore time-frequency analysis to obtain valuable information about experimental signals.

I need to obtain such a plot, that will visualize the change of frequency spectrum in time - for all signal duration - to extract some valuable information about the current signal.

Trying to use wavelets for this purpose.

Here is a link with example I am using to compute coefficients and visualize the spectrogram.

I receive a spectrogram that doesn't reflect the data I have, it seems I have some misunderstanding of the visualization technique or wavelet calculation.

Tech. question. How to show the initial signal and the wavelet spectrogram on one figure and share x-axis? (to be able explore the data in convenient way)
I can't see the resonances in the first part of a signal, but I can do it relatively easily using the signal in a time domain. Why?
What is the data in the upper left and right corners - rounded region?
Maybe I am choosing the wrong scales or levels (see arrays in the code provided, I have played with the numbers but I don't get how to choose their values?)
How to show frequencies on the left scale - not a periods..?
how to add a colormap to understand the current power coefficients value?
Maybe there is some instrument/approach - to obtain levels and scales values automatically? Or some general recommendation/tutorial to understand how to get smooth visualization of data for time-frequency analysis?

The example to reproduce and example data.

# wavelet application for time-frequency analysis
# example used https://ataspinar.com/2018/12/21/a-guide-for-using-the-wavelet-transform-in-machine-learning/

import pywt
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib widget

### loading data
# initial data for analysis
tr_link = 'https://drive.google.com/uc?export=download&id=1FEQvfqQ8BE1rS8BuTqQeRsTBJx_kiH15'

# input dataframe - transmission data - experimental results with the corresponding theoretical data and uncertainty data
tr_df = pd.read_csv(tr_link, index_col=0)

#data with the actual resonance positions
ladder_df = pd.DataFrame({'E': {0: 40.29368109287083, 1: 55.462104682940094, 2: 66.19723693698184, 3: 81.22602772049886, 4: 101.05019001782576, 5: 140.79384791944875, 6: 157.22007983931803, 7: 171.43325808950274, 8: 217.16542649339505, 9: 221.41056096126604, 10: 243.58428941894317, 11: 271.0699959595112, 12: 285.04407065596706, 13: 300.3253362518625, 14: 309.7129973708237, 15: 314.68310009948743, 16: 323.26453920844216, 17: 342.54164584343937, 18: 356.5478102534998, 19: 382.23962909125567, 20: 397.8070178951269, 21: 421.2843581368074, 22: 433.36971676875646, 23: 440.66505067627634, 24: 466.7307530820385, 25: 477.03060262148057, 26: 482.96167606482015, 27: 497.424475651996, 28: 515.5927570438026, 29: 544.9552781653138, 30: 549.3628110144758, 31: 594.9779029029645, 32: 613.1079845026825, 33: 634.7001884990783, 34: 667.2508925896644, 35: 690.0964618342111, 36: 706.2573081949196, 37: 723.455953863036, 38: 753.9685262219448, 39: 764.1023564395969, 40: 783.3375513683885, 41: 803.4441302741657, 42: 814.9006657436429, 43: 829.3065932509774, 44: 858.2810554651402, 45: 883.026051087727, 46: 913.8980609316312, 47: 933.3841005436072, 48: 936.9012403488496, 49: 960.07254411068, 50: 970.1599371526584, 51: 988.6230483408726, 52: 994.1094565980452}, 'tof': {0: 0.00040406988890069487, 1: 0.0003449018934825362, 2: 0.00031598105788645324, 3: 0.00028557849361103294, 4: 0.00025638211406266864, 5: 0.00021771059568433755, 6: 0.00020620234645780873, 7: 0.00019761038050053123, 8: 0.00017594559752920423, 9: 0.00017428275970206476, 10: 0.00016631594547793704, 11: 0.0001578317819181072, 12: 0.0001539969269073221, 13: 0.00015011363900030704, 14: 0.00014787189376091746, 15: 0.00014672587278245796, 16: 0.00014480971199822796, 17: 0.00014077095253061672, 18: 0.00013804430306554194, 19: 0.00013343808731047422, 20: 0.00013086684344748386, 21: 0.0001272621057195108, 22: 0.00012552178822158457, 23: 0.00012450607469089692, 24: 0.00012107366944339282, 25: 0.00011979555386093621, 26: 0.00011907818543729406, 27: 0.00011738300517409577, 28: 0.00011535543399251944, 29: 0.00011229554167237238, 30: 0.00011185753091262304, 31: 0.00010761419597738966, 32: 0.0001060606849751197, 33: 0.00010429807317815873, 34: 0.00010180440741783211, 35: 0.00010016063185845586, 36: 9.904631835791354e-05, 37: 9.790170126465565e-05, 38: 9.596823414123544e-05, 39: 9.535185526580462e-05, 40: 9.421496658465192e-05, 41: 9.307049217268802e-05, 42: 9.243740952453541e-05, 43: 9.166004036892226e-05, 44: 9.015621997184851e-05, 45: 8.893095729395043e-05, 46: 8.747264517648235e-05, 47: 8.65896588575137e-05, 48: 8.643322569777624e-05, 49: 8.542420745803129e-05, 50: 8.499627822618716e-05, 51: 8.423006517753538e-05, 52: 8.400650521127758e-05}} )

### preprocessing steps to get data in a proper format
# rescaling the tof
tr_df['tof'] = tr_df['tof']/1e6 # to get time in seconds

# calculating the step betweeb the points byt hte TOF
tr_df['d_tof'] = tr_df['tof'] - tr_df['tof'].shift(1)

N = tr_df.shape[0] #number of points in the signal

# time step calculation - dt
min_timebin = abs(tr_df['d_tof'].min())
max_timebin = abs(tr_df['d_tof'].max())
std_timebin = abs(tr_df['d_tof'].std())
timebin_avg = abs(tr_df['d_tof'].mean())

print(min_timebin, timebin_avg, max_timebin, std_timebin)

# let's assume that it's the mean value because we have some discretization & computational errors
timebin = round(abs(tr_df['d_tof'].mean())*1e7, 0)/1e7 # average rounded step by tof between each point

t0 = tr_df['tof'].min()
dt = timebin

print('dt = ', dt)
print('t0 = ', t0)

# time array
time_ar = np.arange(0, N) * dt + t0 
# signal data array
signal_ = tr_df['exp_trans'].to_numpy()

# plotting the signal
fig, ax = plt.subplots(figsize=(8, 3))

ax.plot(tr_df['tof'], tr_df['exp_trans'], label='signal')

#plotting the positions of the peaks
for index, row in ladder_df.iterrows():
    ax.axvline(x = row['tof'], color = 'r', linestyle = 'dashed', linewidth = 1.0, alpha=0.5)

ax.set_xlim([time_ar[0], time_ar[-1]])

ax.set_ylabel('Signal Amplitude', fontsize=10)
ax.set_title('Signal', fontsize=10)
ax.set_xlabel('Time', fontsize=10)
ax.legend()
ax.grid()
plt.show()

# for wavelet plotting
cmap = plt.cm.seismic

# wavelet calculation
scales = np.arange(1, 64) # what are these scales??
[coefficients, frequencies] = pywt.cwt(signal_, scales, 'cmor', dt)

pwr = (abs(coefficients)) ** 2

periods = 1. / frequencies

levels = [0.015625, 0.03125, 0.0625, 0.125, 0.25, 0.5, 1, 2, 4, 8, 16, 32] # what are these levels?
contourlevels = np.log2(levels)

fig, ax = plt.subplots(figsize=(8, 5))

im = ax.contourf(time_ar, np.log2(periods), np.log2(pwr), contourlevels,  extend='both', cmap=cmap)

plt.show()

P.S.The actual task is to identify the resonances in the noised signal. The hypothesis is that real and fake resonances (or random noise) differ in a spectrum or in some parameters that can help to identify resonances in the noisy data or to use that information to automate the identification process. (I have a lot of experimental signals which are evaluated - with the known positions of real resonances so the next step is to automate identification of resonances in experimental data using ML-techniques).

is [this](https://stackoverflow.com/a/71927102/1793968) what you mean? — alle_meije, Jan 17 '23 at 10:14
yes, it's very similar to what I wanted to do. But unfortunately, I didn't succeed with wavelet-based analysis of that data. — twistfire, Jan 21 '23 at 14:33

Sheldon · Answer 1 · 2022-12-16T03:21:03.963

Use matplotlib's twinx to create a twin axis sharing the figure's x axis.
This is the cone of influence, described here. Data in this area (upper left and right corners) cannot be trusted as they may be affected by edge effects.
The scales define how you "stretch and squeeze" the wavelet used to analyze your input data. You may relate scales to frequency using pywt's scale2freq.
im = plt.contourf(time_ar, scales, np.log2(pwr), contourlevels, extend='both', cmap=cmap)
What you refer to is a colorbar: simply add plt.colorbar() before plt.show().
Check the "choosing the scales for cwt" section here.

EDIT from December 16, 2022

To answer your second question in the comment section, you select the scales based on the frequency of the input signal that you wish to "analyze". Large scales correspond to "dilated" or "stretched" wavelets, sensitive to the low frequency content of the input signal, while small scale are sensitive to the high frequency content of the input signal. Now, this is a qualitative explanation. In order to quantify which frequency your dilated mother wavelet is sensitive to at any particular scale, you may use pywt's scale2frequency. Again, please check the example in the yellow box in the "Choosing the scales for cwt" section, in the link that I attached.

In you example, you want to run pywt.scale2frequency('cmor', np.arange(1, 64).tolist()). This will return the normalized frequencies that the dilated wavelets are sensitive to. In order to get the actual frequencies (in Hz), divide the output by the sampling rate dt: pywt.scale2frequency('cmor', np.arange(1, 64).tolist())/dt.

Hi Sheldon, thank you for your reply. Maybe I have asked that it's unclear what I want to obtain as a result. I need to show two stacked graphs - the initial signal and the resulting wavelet spectrogram with the shared x-axis and a colorbar to provide the difference scale. I have created a separate question to make it more clear. https://stackoverflow.com/questions/74760767/the-problem-with-the-size-of-stacked-plots-in-matplotlib-python. — twistfire, Dec 11 '22 at 12:23
4. why should I use the im = plt.contourf(time_ar, scales, np.log2(pwr), contourlevels, extend='both', cmap=cmap) instead of im = plt.contourf(time_ar, frequencies, np.log2(pwr), contourlevels, extend='both', cmap=cmap)? — twistfire, Dec 11 '22 at 13:04
I don't get the basic principle of how to choose the mother wavelet and what is the basic principle of choosing initial parameters. E.g. I have chosen a Morlet or Mexican Hat but do I need to choose the initial scale and provide all the scales in comparison with the initial? How to calculate which scales I need to provide for that specific signal to obtain a fine visualization? — twistfire, Dec 11 '22 at 13:23

python wavelet spectrogram plotting - how to obtain correct visualization?

1 Answers1