0

In the following 4D array, each column represents an attribute for machine learning model development.

import numpy as np
from scipy.fft import fft, fftfreq, fftshift

A = np.array([
            [[[0, 1, 2, 3],
              [3, 0, 1, 2],
              [2, 3, 0, 1],
              [1, 3, 2, 1],
              [1, 2, 3, 0]]],
            
            [[[9, 8, 7, 6],
              [5, 4, 3, 2],
              [0, 9, 8, 3],
              [1, 9, 2, 3],
              [1, 0, -1, 2]]],
            
            [[[0, 7, 1, 2],
              [1, 2, 1, 0],
              [0, 2, 0, 7],
              [-1, 3, 0, 1],
              [1, 0, 1, 0]]]
              ])

A.shape
(3, 1, 5, 4)

In this example, A has 3-instances, each instance represented by 4-features (1 for each column of the inner arrays (shaped (1,5,4)). Features are represented in time-domain.

I needed a handy way to calculate the frequency-domain (Fourier transform )for each feature of each instance.

Details

Considering the given example, I do it the following way.

  1. Get the array for all features (1 for each feature like this:

    feat1 = A[..., [0]] # the first feature
    feat2 = A[..., [1]] # the second feature
    feat3 = A[..., [2]] # 3rd feature etc...
    feat4 = A[..., [3]] # 4th feature etc...
    

    So that the first feature in the example data is:

    feat1
    array([[[[ 0],
         [ 3],
         [ 2],
         [ 1],
         [ 1]]],
    
    
       [[[ 9],
         [ 5],
         [ 0],
         [ 1],
         [ 1]]],
    
    
       [[[ 0],
         [ 1],
         [ 0],
         [-1],
         [ 1]]]])
    
  2. Retrieve each instance´s features set, like so:

    # 1st instance
    inst1_1 = feat1[0].ravel()  # instance1 1st feature
    inst1_2 = feat2[0].ravel()  # instance1 2nd feature
    inst1_3 = feat3[0].ravel()  # instance1 3rd feature
    inst1_4 = feat4[0].ravel()  # instance1 4th feature
    
    # 2nd instance 
    inst2_1 = feat1[1].ravel()  # instance2 1st feature
    inst2_2 = feat2[1].ravel()  # instance2 2nd feature
    inst2_3 = feat3[1].ravel()  # instance2 3rd feature
    inst2_4 = feat4[1].ravel()  # instance2 4th feature
    
    # 3rd instance 
    inst3_1 = feat1[2].ravel()  # instance3 1st feature
    inst3_2 = feat2[2].ravel()  # instance3 2nd feature
    inst3_3 = feat3[2].ravel()  # instance3 3rd feature
    inst3_4 = feat4[2].ravel()  # instance3 4th feature
    
  3. Then calculate the Fourier transform (for each feature of each instance).

    inst1_1_fft = fft(inst1_1)
    inst1_2_fft = fft(inst1_2)
    inst1_3_fft = fft(inst1_3)
    inst1_4_fft = fft(inst1_4)
    
    inst2_1_fft = fft(inst2_1)
    inst2_2_fft = fft(inst2_2)
    inst2_3_fft = fft(inst2_3)
    inst2_4_fft = fft(inst2_4)
    
    inst3_1_fft = fft(inst3_1)
    inst3_2_fft = fft(inst3_2)
    inst3_3_fft = fft(inst3_3)
    inst3_4_fft = fft(inst3_4)
    

This is absolutely dummies approach. Besides, I cannot do this on my real dataset where I have over 60K instances.

Questions:

  1. How do I do this at once for the whole array A?

  2. The frequencies seem to be the same, for feature of each instance. Am I doing this correctly? He's how I do:

sampling_rate = A.shape[2] # each instance is fixed-sized A.shape[2] (5 in this e.g)
N = inst1_1_fft.size

inst1_1_fft_freq = fftfreq(N, d = 1 / sampling_rate)

inst1_1_fft_freq
array([ 0.,  1.,  2., -2., -1.])
arilwan
  • 3,374
  • 5
  • 26
  • 62
  • Are you sure this makes sense? You are concatenating different measurements, and taking the FFT of the resulting array. Only 4 FFT bins will relate to the measures, the rest will relate to how the measurements change periodically from one instance to the next. Is the order of the instances meaningful? If not, then taking the FFT across instances is not meaningful either. – Cris Luengo Jan 12 '23 at 01:45
  • Or did I misunderstand and you are taking the FFT over the 5 values only? Something like `np.fft.fft(A, axis=2)`? – Cris Luengo Jan 12 '23 at 01:47

1 Answers1

1

In order to perform a fast Fourier transform along all your instances' features at once, you may do it along axis=2. As an alternative, you may first reshape A, by using np.reshape and np.swapaxes, prior to the fft computation, one possible solution is:

A.shape
>>> (3, 1, 5, 4)

N = 5 # number of time samples for the feature axis
inst_reshape = A.swapaxes(1,3).reshape((-1,N))
inst_reshape.shape
>>> (12, 5) # (number of instances x number of features, time sampling)

which follows the same instance / feature ordering as in your question. You may as well check this by comparing the resulting inst_reshape array with its manually assembled counterpart array:

inst_vstack = np.vstack((inst1_1,inst1_2,inst1_3,inst1_4,\
                         inst2_1,inst2_2,inst2_3,inst2_4,\
                         inst3_1,inst3_2,inst3_3,inst3_4))
np.allclose(inst_vstack,inst_reshape)
>>> True

Having done so, you may now call np.fft.fft once along the column axis axis=1 of the newly reshaped array:

sampling_rate = A.shape[2]
sample_spacing = 1/sampling_rate

freq = np.linspace(0.0, 1.0/(2.0*sample_spacing), N//2)
X = np.fft.fft(inst_reshape, axis=1)
amp = 2/N*np.abs(X[:N//2])

from matplotlib import pyplot as plt
plt.plot(freq, amp)

I would definitely encourage you to take a look at the following question where the general idea behind these kind of array transformations is nicely explained.

Yacola
  • 2,873
  • 1
  • 10
  • 27