Merging two dataframes in Pandas/Python keeping all columns from dataframe 1

Question

Dataframe 1 is a empty dataframe intended to set the "headers" of the columns:

Header Green	Header Yellow	Header Red

Dataframe 2 holds the actual rows, but some of the columns might be different from dataframe 1.

Header Green	Header Yellow	Header Purple
Data Green	Data Yellow	Data Purple

In the resulting dataframe, I want to stack the two dataframes on top of each other, but only keeping the columnes that are in dataframe1 (that is Green,Yellow,Red). The final dataframe should then be:

Header Green	Header Yellow	Header Red
Data Green	Data Yellow	Blank

Just to be clear, the final table:

1 Keeps all the columns from dataframe 1

2 Add the data from dataframe 2 where the columns are the same as dataframe 1

3 Drops any column from dataframe 2 that is not in datafram 1 in the final dataframe.

Unfortunately there is a lot of columns in the dataframes, so the solution needs to understand which columns to keep without being given anything else then the existing name in the dataframes.

Thanks!

You don't need any data from dataframe 1? If that's the case, just make a list of columns from dataframe 2 that are in dataframe 1: `cols_to_keep = [col for col in df2.columns if col in df1.columns]` and then use that column list to select from dataframe 2: `df_new = df1[cols_to_keep]` — chris, Jan 05 '21 at 23:14
Thanks, manage to do that, but that means that df_new does not hold all the columnes from df1 with blank values where the colume is not in df2. So in the above, with this solution i would not get Header Red in the resulting dataframe? — Johannes_Sathre, Jan 05 '21 at 23:23
ah, got it. missed the fact that you needed the columns with empty values. — chris, Jan 05 '21 at 23:35

score 2 · Answer 1 · answered Jan 05 '21 at 23:18

2

Unless the dataset is massive, I feel like this would solve your problem:

Just concatenate and slice after

pd.concat([df,df2])[df.columns]

answered Jan 05 '21 at 23:18

Chris

15,819
3
24
37

score 1 · Accepted Answer · answered Jan 05 '21 at 23:37

1

I think the answer here using df.reindex actually does exactly what you want: https://stackoverflow.com/a/43995812/9357244

df_new = df2.reindex(columns = df.columns)

Will keep columns in df2 if they are columns in df. Any missing columns will be instantiated as null values.

answered Jan 05 '21 at 23:37

chris

1,267
7
20

Thank you so much, this absolutly did the trick. Just as an example, the following code do what i intended: import pandas as pd df1=pd.DataFrame(columns=["a","b","c"]) data={"a":[1,2],"b":[3,4],"d":[5,7]} df2=pd.DataFrame(data) df_new = df2.reindex(columns = df1.columns) print(df_new) – Johannes_Sathre Jan 05 '21 at 23:54

Merging two dataframes in Pandas/Python keeping all columns from dataframe 1

2 Answers2