Need to calculate columns from CSV using pandas

Question

incidentcountlevel1 and examcount were two column names on CSV file. I want to calculate two columns based on these. I have written the script below but it's failing:

import pandas as pd
import numpy as np
import time, os, fnmatch, shutil
df = pd.read_csv(r"/home/corp_sourcing/Metric_Fact_20180324_1227.csv",header='infer',skiprows=[1])
df1 = pd.read_csv(r"/home/corp_sourcing/Metric_Fact_20180324_1227.csv",header='infer',skiprows=[1])
df3 = pd.read_csv("/home/corp_sourcing/Metric_Fact_20180324_1227.csv",header='infer',converters={"incidentcountlevel1":int})
inc_count_lvl_1 = df3.loc[:, ['incidentcountlevel1']]
exam_count=df3.loc[:, ['examcount']]

for exam_count in exam_count: #need to iterate this col to calculate for each row

if exam_count < 1:
        print "IPTE Cannot be calculated"

else:
        if inc_count_lvl_1 > 5:
        ipte1= (inc_count_lvl_1/exam_count)*1000
    else:

        dof = 2*(inc_count_lvl_1+ 1)
        chi_square=chi2.ppf(0.5,dof)
        ipte1=(chi_square/(2*exam_count))×1000

Read [how to ask pandas questions](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — Dan, Mar 26 '18 at 10:37
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). getting this error while using if exam_count < 1: — pradeep panda, Mar 26 '18 at 12:05

Sumit Jha · Answer 1 · 2018-03-26T11:05:01.810

2

You can apply lamda function on pandas column. Just created an example using numpy. You can change according to your case

>>> import numpy as np
>>> df = pd.DataFrame({"A": [10,20,30], "B": [20, 30, 50]})
>>> df['new_column'] = np.multiply(df['A'], df['B'])
>>> df
    A   B  new_column
0  10  20         200
1  20  30         600
2  30  10         1500

or you can create your own function:

>>> def fx(x, y):
...     return x*y
...
>>> df['new_column'] = np.vectorize(fx)(df['A'], df['B'])
>>> df
    A   B  new_column
0  10  20         200
1  20  30         600
2  30  10         1500

I your case, the solution might look like this.

df['new_column'] = np.vectorize(fx)(df['examcount'], df['incidentcountlevel1'])

def fx(exam_count,inc_count_lvl_1):
    if exam_count < 1:
        return -1 ##whatever you want
    else:
            if inc_count_lvl_1 > 5:
            ipte1= (inc_count_lvl_1/exam_count)*1000
        else:

            dof = 2*(inc_count_lvl_1+ 1)
            chi_square=chi2.ppf(0.5,dof)
            ipte1=(chi_square/(2*exam_count))×1000 

        return ipte1

If you dont want to use lamda fucntions then you can use iterrows. iterrows is a generator which yield both index and row.

for index, row in df.iterrows():
    print row['examcount'], row['incidentcountlevel1']
    #do your stuff.

I hope it helps.

edited Mar 26 '18 at 11:05

answered Mar 26 '18 at 10:39

Sumit Jha

1,601
11
18

Need to know, how to read the column value from csv file and use those in for and if loop to compare it with a integer value and then calculate with some arithmetic operation as mentioned in my program. Sorry, i'm new to python any help is appreciated , if related to my program which is shared – pradeep panda Mar 26 '18 at 10:48
such as 'incidentcountlevel1' and 'examcount' needs to be read from csv file and if the value matches with a numeric value then proceed with calculated columns and also want to place a for loop on 'examcount' to iterate all the rows one by one – pradeep panda Mar 26 '18 at 10:51
incidentcountlevel1 examcount 2 0 21 0 33 1 4 2 11 3 6 4 – pradeep panda Mar 26 '18 at 10:53
Just added more information to the solution. I hope it helps. – Sumit Jha Mar 26 '18 at 10:55
df = pd.read_csv(r"/home/corp_sourcing/Metric_Fact_20180324_1227.csv",header='infer') inc_count_lvl_1 = df.loc[:, ['incidentcountlevel1']] exam_count=df.loc[:, ['examcount']] for index, row in df.iterrows(): if exam_count < 1: print "Exam < 1" else: if inc_count_lvl_1 > 5: ipte1= (inc_count_lvl_1/exam_count)*1000 else: dof = 2*(inc_count_lvl_1+ 1) chi_square=chi2.ppf(0.5,dof) ipte1 = (chi_square /( 2 * exam_count))*1000 – pradeep panda Mar 26 '18 at 11:14
if exam_count < 1: File "/home/corp_sourcing/anaconda/lib/python2.7/site-packages/pandas/core/generic.py", line 917, in __nonzero__ .format(self.__class__.__name__)) ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). getting error this error – pradeep panda Mar 26 '18 at 11:14
The code looks messy in the comments. Can you please add sample input data and expected output. It would be easier to understand. https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – Sumit Jha Mar 26 '18 at 11:17
incidentcountlevel1 examcount ipte1 ipte2 2 0 21 0 33 1 4 2 11 3 6 4 – pradeep panda Mar 26 '18 at 11:29
My browser is bit old, the comments are bit messy, can you help me with the code which i shared and error, basically i am reading a csv file for column values, there are numeric values in it and IPTE1 and IPTE2 column were blank and calculated values needs to be put there. – pradeep panda Mar 26 '18 at 11:32

Need to calculate columns from CSV using pandas

1 Answers1