0

Im having trouble appending rows to dataframes in pandas.

The data is read from a excel sheet and put into a DataFrame. Here is a sample piece:

import pandas as pd
df1 = pd.DataFrame({'date':     ['22-jun-18', '22-jun-18', '22-jun-18'], 
                   'id':       ['1', '2', '3',  ]
                   'name':     ['Mark', 'Kate', 'Rollo'  ]
                   'errors':   ['10', '20', '30'  ]
                   'status':   ['failed', 'failed', 'failed',  ]
                   'comment':  ['Reason: invalid id', 'Reason: invalid id', 'Reason: invalid id']
                   'system':   ['X', 'X', 'X'  ]
                   'version':  ['1.1', '1.1', '1.1'  ]
                   'producer': ['Sys', 'Sys', 'Sys'  ]})  

The code:

find_row = searchById(row['ID'], df1)

Returns a row from df1 using ID, works fine. Print shows a row and all columns with data.

And:

df2 = df2.append(find_row, ignore_index=True)

Adds the row, but puts NaN in the last column.

the find_row object looks like this when I print it:

date                                                        22-jun-18
id                                                                  2
name                                                             Kate
errors                                                             20
status                                                         failed
comment                                            Reason: invalid id
system                                                              X
version                                                           1.1
producer                                                          Sys
Name: 2, dtype: object

A total of 9 values, no problems. After appending to the new DataFrame it displays like this:

    date      id    name errors status comment            system    version producer
0   22-jun-18 86758 Kate 20     failed Reason: Invalid id System X  1.1     NaN    

Everything works fine except column 9 which is now NaN.

Here is the searchById function. As said earlier it returns an object with all the data I need

def searchById(id, df):
    for index, row in df.iterrows():
        if(row['key'] == id):
            return row
    return None

Is the problem the append function, or am I handling the rows and DataFrames in the wrong way?

mhaug
  • 1
  • 1
  • 1
    Welcome to SO. Please provide a [mcve]. We can't guess the code you are using for `searchById`, or imagine how your input or desired output should look. See also [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – jpp Jun 27 '18 at 09:57
  • The input lies in an existing DataFrame that checks out ok. It consists of nine coloumns. Getting a row from it works as well. Desired output is a new DataFrame with only the rows I want. Everything works, except that the data in the final (9th) column is now 'NaN – mhaug Jun 27 '18 at 10:12
  • Please read the link on How to make a good pandas example. You should construct a small dataframe, e.g. 3 rows by 9 columns to demonstrate precisely your problem. – jpp Jun 27 '18 at 10:17
  • My first issue. Hope I gave a decent example. Added a sample DF and my output. – mhaug Jun 27 '18 at 12:14

1 Answers1

0

I found out where it disappears. Because the rows would come out in the wrong order, I was using a line to rearrange the columns. Exactly why it disappeared, I'm not sure. Could be there was an illegal character in the last column (Ø).

header_list = ['date','id','name', 'errors', 'status', 'comment', 'system', 'version', 'producer']
df= pd.DataFrame(columns = header_list)

I'm now using the line

df=df [['date','id','name', 'errors', 'status', 'comment', 'system', 'version', 'producer']] 

Without illegal characters to rearrange, and it works fine.

mhaug
  • 1
  • 1