1

I am trying to apply a function to a single column of my dataframe (specifically, normalization).

The dataframe looks like this:

     Euclidian        H         N       Volume
222   0.012288  0.00518  0.011143   85203000.0
99    1.296833 -0.80266  1.018583   17519400.0
98    1.618482 -0.60979  1.499213   16263900.0
211   2.237388  0.38073 -2.204757   38375400.0
175   2.313548  0.35656 -2.285907   66974200.0
102   3.319342  3.01295 -1.392897   33201000.0
7     3.424589 -0.31313  3.410243   97924700.0
64    3.720370 -0.03526  3.720203  116514000.0
125   3.995138  0.27396  3.985733   80526200.0
210   4.999969  0.46453  4.978343   70612100.0

The dataframe is named 'discrepancies', and my code is as such:

max = discrepancies['Volume'].max()
discrepancies['Volume'].apply(lambda x: x/max)
return discrepancies

But the column values do not change. I cannot find anywhere in the documentation to apply to single columns, they only talk about applying to all columns or all rows:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

Thank you

Vranvs
  • 1,411
  • 4
  • 17
  • 38
  • Does this answer your question? [Pandas: How can I use the apply() function for a single column?](https://stackoverflow.com/questions/34962104/pandas-how-can-i-use-the-apply-function-for-a-single-column) – AMC Mar 04 '20 at 16:29
  • As noted in my question, that is what I tried, but it did not change the values – Vranvs Mar 04 '20 at 16:31
  • That’s because you didn’t save the return value anywhere. Check out https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.apply.html. – AMC Mar 04 '20 at 16:33

3 Answers3

3

If it is just a single column, you don't need to use apply. Directly divide the column using its max will do.

discrepancies['Volume'] = discrepancies['Volume'] / discrepancies['Volume'].max()
Toukenize
  • 1,390
  • 1
  • 7
  • 11
1

Since single columns do not need apply also we need assign it back

max = discrepancies['Volume'].max()
discrepancies['some col']=discrepancies['Volume']/max

Also series you can use map

max = discrepancies['Volume'].max()
discrepancies['Volume'].map(lambda x: x/max)
BENY
  • 317,841
  • 20
  • 164
  • 234
1

the problem with your code is that pandas.apply returns the result as new data frame. (there is inplace attribute for lots of pandas functions but not apply)

to correct you code you should do:

max = discrepancies['Volume'].max()
discrepancies['Volume'] = discrepancies['Volume'].apply(lambda x: x/max)
return discrepancies

or you can use @YOBEN_S answer.