I've been searching long time but can't see any implementation about music feature extraction techniques (like spectral centroid, spectral bandwidth etc.) integrated with Apache Spark. I am working with these feature extraction techniques and the process takes a lot of time for music. I want to parallelize and accelerate this process by using Spark. I did some works but couldn't get any speed up. I want to get arithmetic mean and standard deviation of spectral centroid method. This is what I've done so far.
from pyspark import SparkContext
import librosa
import numpy as np
import time
parts=4
print("Parts: ", parts)
sc = SparkContext('local['+str(parts)+']', 'pyspark tutorial')
def spectral(iterator):
l=list(iterator)
cent=librosa.feature.spectral_centroid(np.array(l), hop_length=256)
ort=np.average(cent)
std=np.std(cent)
return (ort, std)
y, sr=librosa.load("classical.00080.au") #This loads the song.
start1=time.time()
normal=librosa.feature.spectral_centroid(np.array(y), hop_length=256) #This is normal technique without spark
end1=time.time()
print("\nOrt: \t", np.average(normal))
print("Std: \t", np.std(normal))
print("Time elapsed: %.5f" % (end1-start1))
#This is where my spark implementation appears.
rdd = sc.parallelize(y)
start2=time.time()
result=rdd.mapPartitions(spectral).collect()
end2=time.time()
result=np.array(result)
total_avg, total_std = 0, 0
for i in range(0, parts*2, 2):
total_avg += result[i]
total_std += result[i+1]
spark_avg = total_avg/parts
spark_std = total_std/parts
print("\nOrt:", spark_avg)
print("Std:", spark_std)
print("Time elapsed: %.5f" % (end2-start2))
The output of the program is below.
Ort: 971.8843380584146
Std: 306.75410601230413
Time elapsed: 0.17665
Ort: 971.3152955225721
Std: 207.6510740703993
Time elapsed: 4.58174
So, even though I parallelized the array y (the array of music signal), I can't speed up the process. It takes longer time. I couldn't understand why. I am newbie with Spark concept. I thought to use GPU for this process but couldn't implement that either. Can anyone help me to understand what I am doing wrong?