0

I have a list that contains a list of target columns:

cols = ["col1", "col2", "col4"]

Then I have several pandas DataFrames with a different number of columns. I must select columns from cols. If one of the columns from cols does not exist in a DataFrame, then NaN values should be generated.

df1 =
col1  col3
1     x1
2     x2
3     x3

df2 =
col1  col2  col4
1     f1    car3
3     f2    car2
4     f5    car1

For example, df2[cols] works well, but df1[cols] obvioulsy fails. I need the following output for df1

df1 =
col1  col2  col3
1     NaN   NaN
2     NaN   NaN
3     NaN   NaN
Sociopath
  • 13,068
  • 19
  • 47
  • 75
Tatik
  • 1,107
  • 1
  • 9
  • 17
  • 1
    Possible duplicate of [How to add an empty column to a dataframe?](https://stackoverflow.com/questions/16327055/how-to-add-an-empty-column-to-a-dataframe) – Georgy Apr 05 '19 at 12:50

1 Answers1

3

Use DataFrame.reindex with list of columns, if no matched are added NaNs columns:

df1 = df1.reindex(cols, axis=1)
print (df1)
   col1  col2  col4
0     1   NaN   NaN
1     2   NaN   NaN
2     3   NaN   NaN

So for df2 are returned same columns:

df2 = df2.reindex(cols, axis=1)
print (df2)
   col1 col2  col4
0     1   f1  car3
1     3   f2  car2
2     4   f5  car1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252