0

I have three dataframes that I've created three different sets of dummy columns. Each dataframe has a slightly different set of dummy variables than the other two.

I am trying to combine something that looks like this -

set1 - (a, b, c, d, e, f)

set2 - (a, b, c, d, f, k)

set3 - (a, c, d, e, f, i, n)

desired set - (a, c, d, f)

Is there a way of doing this by comparing the column names as sets?

  • Your question isn't clear. Please provide a [mcve] including what you've tried so far, according to [How to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – G. Anderson May 07 '20 at 17:58

1 Answers1

0

I think this is what you meant, to merge the data-frames. And the answer for that is absolute yes. Pandas provide great functionality to merge data-frames into one. If you choose it, you probably assume that both of the data-frames share some common values:

import pandas as pd

df1 = pd.DataFrame({
    'letter':['a','b','h','d']})
df2 = pd.DataFrame({
    'letter':['f','q','b','a']})

print(df1.merge(df2))

PRINTS:
  letter
0      a
1      b

I recommend you to go and look at the documentation if you want to extend the functionality even more: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

  • I guess I only want to merge the column names and drop the columns that are not shared by the group. I want to preserve the data in the columns. Is there a way to only merge the column names - df_place_holder = df1.merge(df2), df_place_holder = df_place_holder.merge(df3), df_place_holder.columns.to_list() ? – user1865231 May 07 '20 at 18:18