-1

I'm trying to pass every column of a dataframe through a custom function by using the apply(lamdba x: function in python. The custom function I have created works individually but when put it into the apply(lamdba x: structure only returns NaN values into the selected dataframe.

first is the custom function -

def snr_pd(wavenumber_arr):
    intensity_arr = Zhangfit_output 
    signal_low = 1650
    signal_high = 1750
    noise_low = 1750
    noise_high = 1850

    signal_mask = np.logical_and((wavenumber_arr >= signal_low), (wavenumber_arr < 
    signal_high))
    noise_mask = np.logical_and((wavenumber_arr >= noise_low), (wavenumber_arr < noise_high))

    signal = np.max(intensity_arr[signal_mask])
    noise = np.std(intensity_arr[noise_mask])
    return signal / noise

And this is the setup of the lambda function -

sd['s/n'] = df.apply(lambda x: snr_pd(x), axis =0,)

Currently I believe this is taking the columns form df, passing them to the snr_pd() and appending them to sd under the column ['s/n'], but the only answer produced is NaN.

I have also tried a couple structure changes like using applymap() instead of apply().

sd['s/n'] = fd.applymap(lambda x: snr_pd(x), na_action = 'ignore')

However this return this error instead :

ValueError: zero-size array to reduction operation maximum which has no identity

Which I have even less understanding of.

Any help would be much apricated.

  • First, if you already have a defined function, simply pass it as a reference to `.apply()`. Using a `lambda` is only for cases where you need to define a function. Second, does your DataFrame have any `nan` values or non-numeric values? Third, where is `Zhangfit_output` defined? Fourth, it seems as though your function expects an array as input, but using `apply` maps a function to values of each column one-by-one, similar to `map(func, some_list)`. – ddejohn Sep 22 '21 at 15:30
  • There are no nan values on the dataframe and all values are numerical. Zhangfit_output is from the package BaselineRemoval and is defind as : `spec_obj = BaselineRemoval(fd[h]) Zhangfit_output = spec_obj.ZhangFit()`. Can you explain the 4th part to me ? – Niall Doherty Sep 22 '21 at 16:10

1 Answers1

0

It looks as though your function snr_pd() expects an entire array as an argument.

Without seeing your data it's hard to say, but you should be able to apply the function directly to the DataFrame using np.apply_along_axis():

np.apply_along_axis(snr_pd, axis=0, arr=df)

Note that this assumes that every column in df is numeric. If not, then simply select the columns of the df on which you'd like to apply the function.

Note also that np.apply_along_axis() will return a numpy array.

ddejohn
  • 8,775
  • 3
  • 17
  • 30
  • Yeah every column represent an array. do you mean the names of the columns or that the columns only have numbers in them ? In this case they are, so this should work ! shall give it a try and report back. thanks for the help – Niall Doherty Sep 22 '21 at 16:03
  • The change produces the same error of the second instance posted above – Niall Doherty Sep 22 '21 at 16:06
  • You're going to need to provide sample data, I can't help you debug something I can't see. – ddejohn Sep 22 '21 at 16:09
  • This produces the second error. Though having messed with the size with `df` it appear that the original `df` is to large to be processed all at once thus the second error. that method should work with a break down of the `df`. – Niall Doherty Sep 22 '21 at 16:20
  • `df` is 2647 rows x 2011 columns, all data is numerical. columns = ID , Wavenumber, Particle n, Particle n+1 ..... ect I dont know if I post the literal data – Niall Doherty Sep 22 '21 at 16:24
  • https://stackoverflow.com/a/20159305/6298712 – ddejohn Sep 22 '21 at 16:28