I want to create a column in a pandas dataframe that would add the values of the other columns (which are 0 or 1s). the column is called "sum"
my HEADPandas looks like:
Application AnsSr sum Col1 Col2 Col3 .... Col(n-2) Col(n-1) Col(n)
date 28-12-11 0.0 0.0 28/12/11 .... ...Dates... 28/12/11
~00c 0 0.0 0.0 0 0 0 .... 0 0 0
~00pr 0 0.0 0.0 0 0 0 .... 0 0 0
~00te 0 0.0 0.0 0 0 1 .... 0 0 1
in an image from pythoneverywhere:
expected result (assuming there would be no more columns
Application AnsSr sum Col1 Col2 Col3 .... Col(n-2) Col(n-1) Col(n)
date 28-12-11 0.0 nan 28/12/11 .... ...Dates... 28/12/11
~00c 0 0.0 0.0 0 0 0 .... 0 0 0
~00pr 0 0.0 0.0 0 0 0 .... 0 0 0
~00te 0 0.0 2 0 0 1 .... 0 0 1
as you see the values of 'sum' are kept 0 even if there are 1s values in some columns. what Am I doing wrong?
The basics of the code are:
theMatrix=pd.DataFrame([datetime.today().strftime('%Y-%m-%d')],['Date'],['Application'])
theMatrix['Ans'] = 0
theMatrix['sum'] = 0
so far so good then I add all the values with loc. and then I want to add up values with
theMatrix.fillna(0, inplace=True)
# this being the key line:
theMatrix['sum'] = theMatrix.sum(axis=1)
theMatrix.sort_index(axis=0, ascending=True, inplace=True)
As you see in the result (attached image) the sum remains 0. I had a look to here or here and to the pandas documentation at no avail. Actually the expression:
theMatrix['sum'] = theMatrix.sum(axis=1)
I got it from there.
changing this last line by:
theMatrix['sum'] = theMatrix[3:0].sum(axis=1)
in order to avoid to sum the first three columns gives as result:
Application AnsSr sum Col1 Col2 Col3 .... Col(n-2) Col(n-1) Col(n)
date 28-12-11 0.0 nan 28/12/11 .... ...Dates... 28/12/11
~00c 0 0.0 nan 1 1 0 .... 0 0 0
~00pr 0 0.0 1.0 0 0 0 .... 0 0 1
~00te 0 0.0 0 0 0 0 .... 0 0 0
please observe two things: a) how in row '~00c' sum is nan but there are 1s in that row. b) before the calculating the sum the code theMatrix.fillna(0, inplace=True) should have change all possible nan into 0 so the sum should never be nan since in theory there are no nan values in any of the columns[3:]
it wouldnt work.
some idea?
thanks
PS: Later edition, just in case you wondere how the dataframe is populated: reading and parsing an XML and the lines are:
# myDocId being the name of the columns
# concept being the index.
theMatrix.loc[concept,myDocId]=1