Pandas error:too many boolean indices

Question

I have a csv file with words and their tf-idf scores. I am writing a method to normalize the values (to make them between 0 and 1 ). I am using Pandas library of python. The data is read as dataframe object of Pandas. When I try to run the code, I get an exception-"ValueError: too many boolean indices". Could you please tell me what is going wrong. I went through a couple of answers on multiple forums, but could not relate to what I am facing.

This is the line where I get the error: dtm_norm=(dtm-min)/(diffMaxMin)

This is the data format-

    index   0
0   abbaiah 0.121030858
1   abbaiah_reddi   0.121030858
2   abbaiah_reddi_kaggadasapura 0.121030858

This is the code:

def normalizeValues(inputpath):
    outputpath=inputpath+'normalized\\'

    allFiles =  glob.glob(inputpath+"\\*.csv")
    for file in allFiles:
        fileName=file.split('\\')[-1:][0]
        dtm=pd.read_csv(file)
        min=dtm.min(numeric_only='true')
        max=dtm.max(numeric_only='true')
        diffMaxMin=max-min
        dtm_norm=(dtm-min)/(diffMaxMin)
        writeToCsv(dtm_norm,outputpath+fileName)

Don't know why you get that error but did you look at this related question: http://stackoverflow.com/questions/12525722/normalize-data-in-pandas and also there is a method in sklearn: http://scikit-learn.org/stable/modules/preprocessing.html#scaling-features-to-a-range — EdChum, Mar 18 '15 at 09:44
Yes, I wrote the code referring to the question you suggested. But, I am using min-max normalization — pnv, Mar 18 '15 at 09:45
Something I did notice is that you are not filtering your columns based on dtype, your min. max and diffMaxMin are performed on numeric only columns but you then subtract from `dtm` the `min` df, but `dtm` is your original df which has not been filtered, could this be the problem? — EdChum, Mar 18 '15 at 09:49
Might be, I am not sure...I will try out and let you know. Thanks for the suggestion. — pnv, Mar 18 '15 at 09:50
You could try this `dtm_norm=(dtm[min.columns]-min)/(diffMaxMin)` so that you select the same columns — EdChum, Mar 18 '15 at 09:52
Another possibility is that your `dtm` df has all its original rows and somehow your `min` and `max` dfs have a different number of rows, can you check whether this is the case — EdChum, Mar 18 '15 at 10:51

Pandas error:too many boolean indices

0 Answers0