1

A simple data frame. I want to have a percentage to show the number of rows in the column "Tested" of 22, over the total number of rows.

i.e. 1. there are 5 rows of 22 in the column "Tested"

  1. the data frame total of 15 rows

So the percentage is 5/15 = 0.33

I tried below, but it gives zero.

How can I correct it? Thank you.

import pandas as pd

data = {'Unit_Weight': [335,335,119,119,52,452,19,19,19,165,165,165,724,724,16],
'Tested' : [22,12,14,16,18,20,22,24,26,28,22,22,48,50,22]}

df = pd.DataFrame(data)

num_row = df.shape[0]

suspect_row = df[df["Tested"] == 22].shape[0]

suspect_over_total = suspect_row/num_row

print num_row             # 15
print suspect_row         # 5

print float(suspect_over_total)   # 0.0
Mark K
  • 8,767
  • 14
  • 58
  • 118
  • 1
    Are you still using Python 2.x.? Just convert to float one of the number: `float(suspect_row)/num_row` – bubble Apr 17 '19 at 03:01
  • @bubble, thank you. can you make it an answer so that I can choose it? – Mark K Apr 17 '19 at 03:04
  • 1
    @MarkK, btw, `len(df[df["Tested"] == 22])` looks more efficient than `df[df["Tested"] == 22].shape[0]`. [Source](https://stackoverflow.com/a/15943975/4949074). – ggrelet Apr 17 '19 at 03:10

1 Answers1

1

suspect_over_total = suspect_row/num_row means you are doing an int/int operation whose result is 0.3333333 so Python will give you an int result, 0 in this case.

As bubble said, you should convert one of the operand to a float:

suspect_over_total = float(suspect_row)/num_row   # 0.33333333333
ggrelet
  • 1,071
  • 7
  • 22