Is it possible to merge data frames with one column from the first dataframe mapped across numerous columns in a second dataframe?

Question

I have two data frames:

Dataframe 1 (project data)

project_data = pd.DataFrame({'record_id' : ('1','2','3'), 'account_number' : ('123','124','125'), 'city' :('Vancouver','Miami','Dublin'),'salary' :(70000,80000,65000)})

Dataframe 2 (project data dictionary)

data_dict = pd.DataFrame({'variable_name' : ('record_id','account_number','city','salary'), 'form_name' : ('crosswalk_form','crosswalk_form','demographics','demographics')})

I want to be able to bring in the form_name variable from Dataframe 2 (data dictionary) but don't know how to go about this as there is no common column to join between the data frames.

I was hoping to ask is there a way to join from the column headers in Dataframe 1 (project data) to the column variable_name in Dataframe 2 (data dictionary). or would I need to reshape one of the data frames (possibly Dataframe 1)?

Possible output

merged_data = pd.DataFrame({'record_id' : ('1','2','3'), 'account_number' : ('123','124','125'), 'city' :('Vancouver','Miami','Dublin'),'salary' :(70000,80000,65000),'form_name' : ('crosswalk_form','crosswalk_form','demographics')})

I want this form_name column to categorize the variables for some descriptive statistics I want to do further along. Thanks.

Does this answer your question? [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) — noah, Nov 24 '20 at 17:42
The `form_name` column in `merged_data` is logically WRONG because `form_name` should be associated with specific columns, not rows. This question thus appears to be unsalvageable through editing. The OP must rethink the use case and define a logically feasible output spec. — Bill Huang, Nov 24 '20 at 17:47
@BillHuang yes this is correct, I don't have a common key across data frames. I was just wondering was there a way to reshape the data (Dataframe 1) so I could maybe do this merge on the reshaped dataframe and rename the column of the reshaped data frame to be the same as the other data frame (Dataframe 2) so I could do it this way. Thanks. — Eoin Vaughan, Nov 24 '20 at 17:55
Where did the last row in DF 2 ( `salary demographics`) go in your possible output? Vanished for no reason? — Bill Huang, Nov 24 '20 at 18:05
@BillHuang DF 2 is not data but in fact metadata describing the project data were collecting. I just wanted to know was there a way we could leverage this data to help categorize fields based on `form_name`. Thnaks for the solution below. I must test whether I can use the new `MultiIndex`. I'm new to this type of work, Multindex is useful for what in particular, examples from your work? grouping columns/rows together? — Eoin Vaughan, Nov 24 '20 at 18:18
That will depend on your use case in a broader scheme -- but getting the right picture of your business/work logic is unfortunately out of the scope of SO. You may post a separate question with the new scenario when you're done. — Bill Huang, Nov 24 '20 at 18:36

score 0 · Answer 1 · answered Nov 24 '20 at 17:54

A new MultiIndex can be constructed to replace the original columns directly. (Assume the columns are arranged in the same order of the second dataframe).

project_data.columns = pd.MultiIndex.from_tuples([
    (tup.form_name, tup.variable_name) for tup in data_dict.itertuples(index=False)
])

Output

print(project_data)
  crosswalk_form                demographics       
       record_id account_number         city salary
0              1            123    Vancouver  70000
1              2            124        Miami  80000
2              3            125       Dublin  65000

This literally accomplished what the OP asked for. However, whether such replacement can serve the actual use case well or not remains to be clarified.

Is it possible to merge data frames with one column from the first dataframe mapped across numerous columns in a second dataframe?

1 Answers1