0

I am trying to write a function that receives a list of data frames, then check if those data frames have the same features (columns).

So, First I need to read each data frame from the list, then extract its columns' names and store these names in another list. Finally, compare these lists and return true if they are equaled. I don't reach the comparison step yet, because there is an issue in extracting the names of the columns.

Here is the example I have tried:

# Basic libraries
import os
import pandas as pd
import numpy as np

def merge_df(lis):
    df_list=[]
    j=0
    for i in lis:
        name = "df" + str(j)
        print(name)
        name = pd.DataFrame(i)
        name = name.values.tolist()
        df_list.append(name)
        j+=1
    print(df_list)

data_dict = {'First':[100, 90, np.nan, np.nan], 
        'Second': [30, 45, 56, np.nan], 
        'Third':[np.nan, 40, 80, np.nan],
        'Forth': [30,40,50,np.nan]} 

df1 = pd.DataFrame(data_dict)

data_dict2 = {'First':[100, 90, 4,3], 
        'Second': [30, 45, 56,0], 
        'Third':[np.nan, 40, 80, 5],
        'Forth': [30,40,50,np.nan]} 

df2 = pd.DataFrame(data_dict2)
lis = [df21,df2]
#the size of lis is >= 2
merge_df(lis)

Since the two data frames have the same features First,Second,Third,Forth, I expect that the function will return yes.

I am sure the problem in name = name.values.tolist() because the data-frame is treated as a string. Also, the same in df_list.append(name).

Then, it's normal to get this error DataFrame constructor not properly called!.

So, are there any issues with this function that I have to take care of?

  • 1
    It would be nice if you included a few DataFrames with your [mre]. [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). And it really isn't clear what the intent is. – wwii Nov 19 '20 at 17:42
  • 1
    This seems like you might be doing more work than necessary. Two dataframes (`df` and `df1`)can be checked with `all(df==df1)` or `(df.values==df1.values).all()` – G. Anderson Nov 19 '20 at 17:43
  • 1
    Can you show/tell us what `lis` is? Does it really contain `DataFrames`? Then why are you passing the elements to the `pd.DataFrame` constructor? – Jan Christoph Terasa Nov 19 '20 at 17:46
  • @JanChristophTerasa the lis contains a list of data frames, then I want to trace each df in the list and extract its features and save it in a new list i.e. `df0`. Then, I want to compare all of these lists and return yes if their contents are the same – Qaddomi Obaid Nov 19 '20 at 18:06
  • @G.Anderson Actually, I want to do the same, but for more than two data frames. – Qaddomi Obaid Nov 19 '20 at 18:27
  • Does the accepted answer in [this question](https://stackoverflow.com/questions/60452817/check-if-multiple-pd-dataframes-are-equal) help? – G. Anderson Nov 19 '20 at 18:38
  • @G.Anderson the attached solution checks if a list of data frames have the same content, but what I need is to check if these data frames have the same features. – Qaddomi Obaid Nov 19 '20 at 18:42
  • What do you mean "same features"? Do you mean the same _columns_? – G. Anderson Nov 19 '20 at 18:46
  • @G.Anderson Yes, I do. I mean the same columns. – Qaddomi Obaid Nov 19 '20 at 18:48
  • @wwii I update my question. I hope it is understandable now :) – Qaddomi Obaid Nov 19 '20 at 19:27
  • 2
    Does [Determining if several data frames have the same column names in pandas](https://stackoverflow.com/questions/59198707/determining-if-several-data-frames-have-the-same-column-names-in-pandas) answer your question? – wwii Nov 19 '20 at 19:49
  • 1
    Then the solution from the linked question works, just needs to be updated to check the columns instead of the data: `all(x.columns.equals(lis[0].columns) for x in lis)` will tell you if they all match – G. Anderson Nov 19 '20 at 19:50
  • @G.Anderson I think I can accept your solution because it's easy to understand for me. – Qaddomi Obaid Nov 19 '20 at 20:03
  • @wwii I like the 2nd solution from the attached link, and it works well for me, but it is a little bit hard to understand for me. So, I will be thoughtful if you give me a brief explanation about how it works. :) – Qaddomi Obaid Nov 19 '20 at 20:07
  • The explanation in the answer seems to be satisfactory. `dataframe.columns` must be set-like because it has an `intersection` method. The expression verifies that the length of the intersection of one dataframe's columns to all other dataframe's columns are the same as the number of columns (`.shape[1]`) in the *first* dataframe – wwii Nov 19 '20 at 20:11
  • 1
    ... get the length's of the intersection of the first dataframe with each of the other dataframes and verify that all those lengths are the same value as the number of columns in the first dataframe. – wwii Nov 19 '20 at 20:15

0 Answers0