1

I am running below code and get this:

import pandas as pd
pf=pd.read_csv("https://www.dropbox.com/s/08kuxi50d0xqnfc/demo.csv?dl=1")
x=pf[pf['fuv1'] == 0].count()*100/1892
x
id          0.528541
date        0.528541
count       0.528541
idade       0.528541
site        0.528541
baseline    0.528541
fuv1        0.528541
fuv2        0.475687
fuv3        0.528541
fuv4        0.475687
dtype: float64

What I want is just to get this result 0.528541 and forgot all the above results.

What to do? Thanks.

MGB.py
  • 461
  • 2
  • 9
  • 25
  • @jezrael, what do you know about this? I just want **fuv1 result** only! – MGB.py Feb 17 '18 at 17:49
  • 1
    I understand now, both solution works (`pf.loc[pf['fuv1'] == 0, 'fuv1'].count()*100/1892`) or `(pf['fuv1'] == 0).sum()*100/1892`. With `sum` should be faster in larger `DataFrame`. – jezrael Feb 17 '18 at 18:03

3 Answers3

2
In [282]: pf.loc[pf['fuv1'] == 0, 'id'].count()*100/1892
Out[282]: 0.5285412262156448
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
2

If want count number of 0 values in column fuv1 use sum for count Trues which are processes like 1s:

print ((pf['fuv1'] == 0).sum())
10

x = (pf['fuv1'] == 0).sum()*100/1892
print (x)
0.528541226216

Explanation why different outputs - count exclude NaNs:

pf=pd.read_csv("https://www.dropbox.com/s/08kuxi50d0xqnfc/demo.csv?dl=1")
x=pf[pf['fuv1'] == 0]
print (x)
    id       date  count  idade site  baseline  fuv1  fuv2  fuv3  fuv4
0    0   4/1/2016     10     13    A         1   0.0   1.0   0.0   1.0
2    2   4/3/2016      9      5    C         1   0.0   NaN   0.0   1.0
3    3   4/4/2016    108     96    D         1   0.0   1.0   0.0   NaN
11  11  4/12/2016      6     13    C         1   0.0   1.0   1.0   0.0
13  13  4/14/2016     12      4    C         1   0.0   1.0   1.0   0.0
40  40  5/11/2016     14      7    C         1   0.0   1.0   1.0   1.0
41  41  5/12/2016      0     26    C         1   0.0   1.0   1.0   1.0
42  42  5/13/2016     10     15    C         1   0.0   1.0   1.0   1.0
60  60  5/31/2016     13      3    D         1   0.0   1.0   1.0   1.0
74  74  6/14/2016     15      7    B         1   0.0   1.0   1.0   1.0

print (x.count())
id          10
date        10
count       10
idade       10
site        10
baseline    10
fuv1        10
fuv2         9
fuv3        10
fuv4         9
dtype: int64
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0
import pandas as pd

pf=pd.read_csv("https://www.dropbox.com/s/08kuxi50d0xqnfc/demo.csv?dl=1")

x = (pf['fuv1'] == 0).sum()*100/1892
y=pf["idade"].mean()

l = "Performance"
k = "LTFU"


def test(l1,k1):
    return pd.DataFrame({'a':[l1, k1], 'b':[x, y]})

df1 = test(l,k)
df1.columns = [''] * len(df1.columns)   
df1.index = [''] * len(df1.index)   

print(round(df1, 2))

  Performance   0.53
         LTFU  14.13
MGB.py
  • 461
  • 2
  • 9
  • 25