1

I'm trying to add column headers with empty values to my dataframe (just like this answer), but within a function that is already modifying it, like so:

mydf = pd.DataFrame()

def myfunc(df):
  df['newcol1'] = np.nan  # this works

  list_of_newcols = ['newcol2', 'newcol3']
  df = df.reindex(columns=df.columns.tolist() + list_of_newcols)  # this does not
  return
myfunc(mydf)

If I run the lines individually in an IPython console, it will add them. But run as a script, newcol1 will be added but 2 and 3 will not. Setting copy=False does not work either. What am I doing wrong here?

Excel Help
  • 134
  • 12

3 Answers3

2

Pandas df.reindex() produces a new object unless the indexes are equivalent, so you will need to return the new object from your function.

def myfunc(df):
  df['newcol1'] = np.nan  # this works

  list_of_newcols = ['newcol2', 'newcol3']
  df = df.reindex(columns=df.columns.tolist + list_of_newcols)  # this does not
  return df

mydf = myfunc(mydf)
Engineero
  • 12,340
  • 5
  • 53
  • 75
0

Not sure if this is the mistake you made with the actual code or while you were typing it in here, but the tolist() is a function and you must add the brackets.

df = df.reindex(columns=df.columns.tolist() + list_of_newcols)
Vinay Bharadhwaj
  • 165
  • 1
  • 17
0

You don't need to set NaN values and specify again new column labels. You can reindex with an arbitrary list of strings; NaN is the default value where data is not specified.

df = pd.DataFrame({'A': [1, 2, 3]})

df = df.reindex(columns=['A', 'B', 'C'])

print(df)

   A   B   C
0  1 NaN NaN
1  2 NaN NaN
2  3 NaN NaN
jpp
  • 159,742
  • 34
  • 281
  • 339