How to select and calculate with value from specific variable in dataframe with pandas

Question

I am running below code and get this:

import pandas as pd
pf=pd.read_csv("https://www.dropbox.com/s/08kuxi50d0xqnfc/demo.csv?dl=1")
x=pf[pf['fuv1'] == 0].count()*100/1892
x
id          0.528541
date        0.528541
count       0.528541
idade       0.528541
site        0.528541
baseline    0.528541
fuv1        0.528541
fuv2        0.475687
fuv3        0.528541
fuv4        0.475687
dtype: float64

What I want is just to get this result 0.528541 and forgot all the above results.

What to do? Thanks.

@jezrael, what do you know about this? I just want **fuv1 result** only! — MGB.py, Feb 17 '18 at 17:49
I understand now, both solution works (`pf.loc[pf['fuv1'] == 0, 'fuv1'].count()*100/1892`) or `(pf['fuv1'] == 0).sum()*100/1892`. With `sum` should be faster in larger `DataFrame`. — jezrael, Feb 17 '18 at 18:03

score 2 · Answer 1 · answered Feb 17 '18 at 17:48

2

In [282]: pf.loc[pf['fuv1'] == 0, 'id'].count()*100/1892
Out[282]: 0.5285412262156448

answered Feb 17 '18 at 17:48

MaxU - stand with Ukraine

205,989
36
386
419

do you know simplest to round the output? – MGB.py Feb 17 '18 at 18:40

jezrael · Accepted Answer · 2018-02-17T17:55:29.340

2

If want count number of 0 values in column fuv1 use sum for count Trues which are processes like 1s:

print ((pf['fuv1'] == 0).sum())
10

x = (pf['fuv1'] == 0).sum()*100/1892
print (x)
0.528541226216

Explanation why different outputs - count exclude NaNs:

pf=pd.read_csv("https://www.dropbox.com/s/08kuxi50d0xqnfc/demo.csv?dl=1")
x=pf[pf['fuv1'] == 0]
print (x)
    id       date  count  idade site  baseline  fuv1  fuv2  fuv3  fuv4
0    0   4/1/2016     10     13    A         1   0.0   1.0   0.0   1.0
2    2   4/3/2016      9      5    C         1   0.0   NaN   0.0   1.0
3    3   4/4/2016    108     96    D         1   0.0   1.0   0.0   NaN
11  11  4/12/2016      6     13    C         1   0.0   1.0   1.0   0.0
13  13  4/14/2016     12      4    C         1   0.0   1.0   1.0   0.0
40  40  5/11/2016     14      7    C         1   0.0   1.0   1.0   1.0
41  41  5/12/2016      0     26    C         1   0.0   1.0   1.0   1.0
42  42  5/13/2016     10     15    C         1   0.0   1.0   1.0   1.0
60  60  5/31/2016     13      3    D         1   0.0   1.0   1.0   1.0
74  74  6/14/2016     15      7    B         1   0.0   1.0   1.0   1.0

print (x.count())
id          10
date        10
count       10
idade       10
site        10
baseline    10
fuv1        10
fuv2         9
fuv3        10
fuv4         9
dtype: int64

edited Feb 17 '18 at 17:55

answered Feb 17 '18 at 17:50

jezrael

822,522
95
1,334
1,252

how to round for 2 decimal digits here: x = (pf['fuv1'] == 0).sum()*100/1892 – MGB.py Feb 17 '18 at 18:15
1

Use `x = round((pf['fuv1'] == 0).sum()*100/1892, 2)`, because working with scalar – jezrael Feb 17 '18 at 18:16
did you see the ouput of your proposed code? Does it gives o.53 as result? – MGB.py Feb 17 '18 at 18:23
Yes, it return`0.53` Do you get something else? – jezrael Feb 17 '18 at 18:23
i get 0.53000000000000003 as before asking you – MGB.py Feb 17 '18 at 18:23
Yes, I understand. There is problem float accuraccy, check [this](https://stackoverflow.com/q/455612/2901002) – jezrael Feb 17 '18 at 18:25
1

I found over ther this simplest solution x = round((pf['fuv1'] == 0).sum()*100/1892, 2) format(x, '.2f') and returns '0.53' instead of 0.53 But i am not satisfied, I need simplest solution like round within vector. – MGB.py Feb 17 '18 at 18:34
why does trick works? print (round((pf['fuv1'] == 0).sum()*100/1892, 2)) I was trying myself. – MGB.py Feb 17 '18 at 18:48
I need to work with vector so this print (round((pf['fuv1'] == 0).sum()*100/1892, 2)) for me will not help. I want to include the vector in that function you fixed before. – MGB.py Feb 17 '18 at 18:50
@MGB.py - do you want [`round`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.round.html) ? – jezrael Feb 17 '18 at 18:51
@MGB.py - If want use `format` - `df['col'].apply(lambda x: format(x, '.2f'))` – jezrael Feb 17 '18 at 18:59
I did not understand. – MGB.py Feb 17 '18 at 19:07
You like `format` solution, so if need apply it to column use solution above. ;) – jezrael Feb 17 '18 at 19:09
like this?? pf['fuv1'].apply(lambda x: format(x, '.2f')) Doesnot work... – MGB.py Feb 17 '18 at 19:13
It does not work, because there are not floats. test it by new column like `pf['test'] = pf['count'].div(3456)` and then `pf['new'] = pf['test'].apply(lambda x: format(x, '.2f'))` – jezrael Feb 17 '18 at 19:16
pf.dtypes id int64 date object count int64 idade int64 site object baseline int64 fuv1 float64 fuv2 float64 fuv3 float64 fuv4 float64 dtype: object – MGB.py Feb 17 '18 at 19:31
Not understand. – jezrael Feb 17 '18 at 19:32
they are float. Maybe I need to convert to int before applying the simplest round. How to convert? – MGB.py Feb 17 '18 at 19:37
hmmm, but there are in `fuv1` only floats with `0`, so why do you need round? Or values are changed? – jezrael Feb 17 '18 at 19:40
because I need to calculate the percentage of fuv1=0 in 1892 – MGB.py Feb 17 '18 at 19:46
but why it is array? Not scalar? – jezrael Feb 17 '18 at 19:48
I am confuse with the terminology: array and scalar. Can you test them as the dataset is accessible? – MGB.py Feb 17 '18 at 19:50
1

scalar is one number, 1d array in pandas is Series (column of dataframe). 2d array is DataFrame. – jezrael Feb 17 '18 at 19:51
if you can transform this print (round((pf['fuv1'] == 0).sum()*100/1892, 2)) into vector I could be very satisfied. – MGB.py Feb 17 '18 at 19:55
I am a bit confused, what is vector? what is expected output? New column? – jezrael Feb 17 '18 at 19:56
I call the below as a vector: x = print (round((pf['fuv1'] == 0).sum()*100/1892, 2)) – MGB.py Feb 17 '18 at 19:58
here X is vector for me ;-) – MGB.py Feb 17 '18 at 19:59
X is ignored and it return the result 0.53 instead of not printing result and wait for request for the value of X – MGB.py Feb 17 '18 at 20:05
I dont understand. `x = 0.53` it assign `0.53` to variable `x`. But it is not vector. What python or pandas structure need? There is no data structure vector. :( – jezrael Feb 17 '18 at 20:12
I am posting below an answer what I was looking for. – MGB.py Feb 17 '18 at 20:22
Do you need `x = round((pf['fuv1'] == 0).sum()*100/1892, 2)`, right? – jezrael Feb 17 '18 at 20:23
`x = round((pf['fuv1'] == 0).sum()*100/1892, 2)` does not work? – jezrael Feb 17 '18 at 20:28

MGB.py · Answer 3 · 2018-02-17T20:48:37.047

0

import pandas as pd

pf=pd.read_csv("https://www.dropbox.com/s/08kuxi50d0xqnfc/demo.csv?dl=1")

x = (pf['fuv1'] == 0).sum()*100/1892
y=pf["idade"].mean()

l = "Performance"
k = "LTFU"


def test(l1,k1):
    return pd.DataFrame({'a':[l1, k1], 'b':[x, y]})

df1 = test(l,k)
df1.columns = [''] * len(df1.columns)   
df1.index = [''] * len(df1.index)   

print(round(df1, 2))

  Performance   0.53
         LTFU  14.13

edited Feb 17 '18 at 20:48

answered Feb 17 '18 at 20:25

MGB.py

461
2
9
25

@jezrael, I want to remove arrays o and 1 here to be fully satisfied. – MGB.py Feb 17 '18 at 20:27
need `df1.index= [''] * len(df1.index)` – jezrael Feb 17 '18 at 20:31
@jezrael, you are almost there: you removed '0' and '1' but you brought back columns names 'a' and 'b'. :-( – MGB.py Feb 17 '18 at 20:35
I think `df1.columns = [''] * len(df1.columns)` and `df1.index= [''] * len(df1.index)` – jezrael Feb 17 '18 at 20:36
@jezrael, SyntaxError: can't assign to operator – MGB.py Feb 17 '18 at 20:38
1

But it is really awfull hack. :( – jezrael Feb 17 '18 at 20:40
@jezrael, please post the code as an answer to let me copy and paste.Thanks – MGB.py Feb 17 '18 at 20:43
1

@jezrael, the problem was that I was adding 'and' in the synthax. :-( – MGB.py Feb 17 '18 at 20:49

How to select and calculate with value from specific variable in dataframe with pandas

3 Answers3