1

I'm trying to use the BaselineRemoval package to remove background fluorescence from some Raman spectra. In the code documentation, it states the preferred format for the input as input_array: A pandas dataframe column provided in input as dataframe['input_df_column']. It can also be a Python list object

My example-

df = pd.DataFrame(
    {'Patient': [1, 2, 3, 4, 5, 6],
     'Group': [1, 1, 1, 2, 2, 2],
     'Samples': [list(np.random.randn(3).round(2)) for i in range(6)]
    }
)

input_array = df['Samples']
polynomial_degree = 2

baseObj = BaselineRemoval(input_array)
Modpoly_output = baseObj.ModPoly(polynomial_degree)

However, this gives the error ValueError: setting an array element with a sequence.

Not sure how to proceed.

StatguyUser
  • 2,595
  • 2
  • 22
  • 45
Sp_95
  • 133
  • 9
  • I checked the value of first row for `Samples` column. It seems like a list object `[-0.89, 0.09, 1.23]`. Raman spectra can be calculated for an array of values. for example `Temperature` or `Pressure` or `Wavelength`. **BUT** separately. What this pandas column instead has is [Temperature, pressure, wavelength]. Baseline removal can be done for an array object, not a multi-dimensional matrix. Instead, if you store these values in separate arrays and process each array separately then it will give you the desired baseline removed spectra. – StatguyUser Aug 28 '20 at 13:16
  • Check and fix dimension of the array and split into multiple arrays and process for each separately. – StatguyUser Aug 28 '20 at 13:23
  • Hmmm. The data frame above is just a dummy example to be taken at face value. They're all amplitudes recorded by a probe across the Raman scale which does not output a multi-dimensional matrix. So the column is [amplitue_at_wavenumber1, amplitue_at_wavenumber2, amplitue_at_wavenumber3]. In my original dataset, the samples column has over 2000 fluorescence amplitudes in a list per patient. Not sure sure how storing them as separate values will help. – Sp_95 Aug 28 '20 at 13:38
  • I see that the problem you are facing is about how to oranize data. The library can help you for doing baseline removal, if its in an array form. In that case, a simple for loop should do the job for you, as you rightly identified it. Wish you all the best for your project. – StatguyUser Aug 29 '20 at 09:33

1 Answers1

1

A simple for loop should do it.

df = pd.DataFrame(
    {'Patient': [1, 2, 3, 4, 5, 6],
     'Group': [1, 1, 1, 2, 2, 2],
     'Samples': [list(np.random.randn(3).round(2)) for i in range(6)]
    }
)

input_array = df['Samples']
polynomial_degree = 2

for row in input_array:
    print(BaselineRemoval(row).ModPoly(polynomial_degree))
Sp_95
  • 133
  • 9