0

Dataframe 1 is a empty dataframe intended to set the "headers" of the columns:

Header Green Header Yellow Header Red

Dataframe 2 holds the actual rows, but some of the columns might be different from dataframe 1.

Header Green Header Yellow Header Purple
Data Green Data Yellow Data Purple

In the resulting dataframe, I want to stack the two dataframes on top of each other, but only keeping the columnes that are in dataframe1 (that is Green,Yellow,Red). The final dataframe should then be:

Header Green Header Yellow Header Red
Data Green Data Yellow Blank

Just to be clear, the final table:

1 Keeps all the columns from dataframe 1

2 Add the data from dataframe 2 where the columns are the same as dataframe 1

3 Drops any column from dataframe 2 that is not in datafram 1 in the final dataframe.

Unfortunately there is a lot of columns in the dataframes, so the solution needs to understand which columns to keep without being given anything else then the existing name in the dataframes.

Thanks!

  • You don't need any data from dataframe 1? If that's the case, just make a list of columns from dataframe 2 that are in dataframe 1: `cols_to_keep = [col for col in df2.columns if col in df1.columns]` and then use that column list to select from dataframe 2: `df_new = df1[cols_to_keep]` – chris Jan 05 '21 at 23:14
  • Thanks, manage to do that, but that means that df_new does not hold all the columnes from df1 with blank values where the colume is not in df2. So in the above, with this solution i would not get Header Red in the resulting dataframe? – Johannes_Sathre Jan 05 '21 at 23:23
  • ah, got it. missed the fact that you needed the columns with empty values. – chris Jan 05 '21 at 23:35

2 Answers2

2

Unless the dataset is massive, I feel like this would solve your problem:

Just concatenate and slice after

pd.concat([df,df2])[df.columns]
Chris
  • 15,819
  • 3
  • 24
  • 37
1

I think the answer here using df.reindex actually does exactly what you want: https://stackoverflow.com/a/43995812/9357244

df_new = df2.reindex(columns = df.columns)

Will keep columns in df2 if they are columns in df. Any missing columns will be instantiated as null values.

chris
  • 1,267
  • 7
  • 20
  • Thank you so much, this absolutly did the trick. Just as an example, the following code do what i intended: import pandas as pd df1=pd.DataFrame(columns=["a","b","c"]) data={"a":[1,2],"b":[3,4],"d":[5,7]} df2=pd.DataFrame(data) df_new = df2.reindex(columns = df1.columns) print(df_new) – Johannes_Sathre Jan 05 '21 at 23:54