1

I've created a dataframe. I'd like to create a new dataframe depending on the current dataframe's conditions. My Python code is as follows:

df = pd.DataFrame({'A':[1,2,3,4,5,6,7,8,9,10],'B':[10,20,30,40,50,60,70,80,90,100]})
df
    A   B
0   1   10
1   2   20
2   3   30
3   4   40
4   5   50
5   6   60
6   7   70
7   8   80
8   9   90
9   10  100

import pywt
import numpy as np

import scipy.signal as signal
import matplotlib.pyplot as plt
from skimage.restoration import denoise_wavelet
wavelet_type='db6'


def new_df(df):
  df0 = pd.DataFrame()
  if (df.iloc[:,0]>=1) & (df.iloc[:,0]<=3):
    df0['B'] = denoise_wavelet(df.loc[(df.iloc[:,0]>=1) & (df.iloc[:,0]<=3),'B'], method='BayesShrink', mode='soft', wavelet_levels=3, wavelet='sym8', rescale_sigma='True')
  elif (df.iloc[:,0]>=4) & (df.iloc[:,0]<=6):
    df0['B'] = denoise_wavelet(df.loc[(df.iloc[:,0]>=4) & (df.iloc[:,0]<=6),'B'], method='BayesShrink', mode='soft', wavelet_levels=3, wavelet='sym8', rescale_sigma='True') 
  else:
    df0['B']=df.iloc[:,1]
  return df0

I want a new dataframe that will denoise the values in column B that meet the conditions, but leave the remaining values alone and keep them in the new dataframe. My code gives me error message: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). Could you please help me?

My desired output should look

    A   B
0   1   15*
1   2   25*
2   3   35*
3   4   45*
4   5   55*
5   6   65*
6   7   70
7   8   80
8   9   90
9   10  100

#* represents new values may be different when you get the result. 
#this is just for a demo. 

May be my code idea is wrong. Could you please help me?

2 Answers2

0

(df.iloc[:,0]>=1) will return a pandas series of boolean values corresponding to which elements in the first column of df are greater than or equal to 1.

In the line

if (df.iloc[:,0]>=1) & (df.iloc[:,0]<=3):

you are hence trying to do boolean arithmetic with two pandas series which doesn't make sense.

Pandas gives you a hint in the error message as to what might solve the problem: e.g. if you wanted to check whether any element in df.iloc[:,0] was greater than one, you could use (df.iloc[:,0]>=1).any() which would return a single bool that you could then compare with the result of (df.iloc[:,0]<=3).any(). Without more context to the problem or what you're trying to do, it is hard to help you further.

Tom B.
  • 158
  • 9
  • I'd want to create a new dataframe in which the corresponding values of column B denoise. B values are 10,20,30 in the case of 1 – pythonhater May 31 '22 at 18:02
  • I have edited my question please have a look now. – pythonhater May 31 '22 at 19:48
  • 1
    Just to add to @tom-b anser, boolean arithmetics for Series are defined and will perform element-wise. The error is thrown when the if statement tries to evaluate if the resulting series is "truthy". See: https://stackoverflow.com/questions/36921951/truth-value-of-a-series-is-ambiguous-use-a-empty-a-bool-a-item-a-any-o – scespinoza May 31 '22 at 20:18
0

Note that since you are filtering the data while passing it to denoise_wavelet, you don't really need the if statements, but you should assign the returned value to the same "view" of the DataFrame. Here is my approach. It first copy the original DataFrame and replace the desired rows with the "denoised" data.

import numpy as np
import pandas as pd
import scipy.signal as signal
import matplotlib.pyplot as plt
from skimage.restoration import denoise_wavelet
wavelet_type='db6'

df = pd.DataFrame({'A':[1,2,3,4,5,6,7,8,9,10],'B':[10,20,30,40,50,60,70,80,90,100]})


def new_df(df):
    df0 = df.copy()
    df0.loc[(df.iloc[:,0]>=1) & (df.iloc[:,0]<=3),'B'] = denoise_wavelet(df.loc[(df.iloc[:,0]>=1) & (df.iloc[:,0]<=3),'B'].values, method='BayesShrink', mode='soft', wavelet_levels=3, wavelet='sym8', rescale_sigma='True')
    df0.loc[(df.iloc[:,0]>=4) & (df.iloc[:,0]<=6),'B'] = denoise_wavelet(df.loc[(df.iloc[:,0]>=4) & (df.iloc[:,0]<=6),'B'].values, method='BayesShrink', mode='soft', wavelet_levels=3, wavelet='sym8', rescale_sigma='True') 
    return df0

new_df(df)

However, I don't really know how denoise_wavelet so I don't know if the result is correct, but the values from index 6 to 9 are left unchanged.

Updated

For applying for 2 or more columns:

df = pd.DataFrame({'A':[1,2,3,4,5,6,7,8,9,10],
                   'B1':[10,20,30,40,50,60,70,80,90,100],
                   'B2':[10,20,30,40,50,60,70,80,90,100],
                   'B3':[10,20,30,40,50,60,70,80,90,100]})

def apply_denoise(col):
    col.loc[1:3] = denoise_wavelet(col.loc[1:3], method='BayesShrink', mode='soft', wavelet_levels=3, wavelet='sym8', rescale_sigma='True')
    col.loc[4:6] = denoise_wavelet(col.loc[4:6], method='BayesShrink', mode='soft', wavelet_levels=3, wavelet='sym8', rescale_sigma='True')
    return col
    
new_df = df.set_index('A').apply(apply_denoise)
new_df

Note that since you are always conditioning on column 'A' you can convert it to an index and make use of indexing to implement the condition. Then using apply you can call the function apply_denoise on each column, and it will return a new DataFrame with the resulting columns.

scespinoza
  • 396
  • 3
  • 10
  • Thanks for your help. If I have more than 2 columns. Say I have 32 columns. In that case, what should I need to change in the code? Do you have any suggestions? – pythonhater May 31 '22 at 21:20
  • It depends on what you mean by having more columns. Do you need to apply `denoise_wavelet` to each column based on the values of 'A'? Is column 'A' an identifier (or index) for each row or it changes depending on the column you need to evaluate? It would help if you can give a code example, but the answer probably involves creating a function and applying it to each column using `.apply()` (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html). – scespinoza Jun 01 '22 at 01:58
  • Yes, condition will be remain same. But the value column will increase as B1 through B32. In that case if I use this code df0.loc[(df.iloc[:,0]>=1) & (df.iloc[:,0]<=3),'B'] = denoise_wavelet(df.loc[(df.iloc[:,0]>=1) & (df.iloc[:,0]<=3),'B'].values, method='BayesShrink', mode='soft', wavelet_levels=3, wavelet='sym8', rescale_sigma='True') for each column, it would be a tedious work. I am new to python. Could you please show me how to do this? – pythonhater Jun 01 '22 at 05:27
  • I updated my answer with an example. I'm not quite sure if it is what you are looking for, though. – scespinoza Jun 01 '22 at 14:05