I'm having some difficulties using pandas..
I have 2 dataframes (named bru
and bru2
) both coming from almost the same file. the only diffrence between the 2 files is that I have added an extra row and changed a cell value from "4" to "50000" for testing.
What i'd now like to do is look for changed cells and new rows.
But first of all, I'm checking if both dataframes are the same so that I don't have to look for changes when both files have the exact same data.
When I try to compare them (bru == bru2), I get an error: Can only compare identically-labeled DataFrame objects
.
I'm importing the files like this, I also drop some columns that I don't need, reorder both files their columns in the exact same order and rename some for prefrence:
bru = pd.read_csv("file1.csv", dtype={"street_id": "string", "address_id": "string"})
bru = bru.fillna('')
bru = bru.drop(columns=["EPSG:31370_x", "EPSG:31370_y", "EPSG:4326_lat", "EPSG:4326_lon", "postname_fr", "postname_nl", "streetname_de"])
bru = bru.rename(columns={"postcode": "pkancode"})
bru = bru.reindex(columns=["address_id", "box_number", "house_number", "municipality_id", "municipality_name_de", "municipality_name_fr", "municipality_name_nl", "pkancode", "street_id", "streetname_nl", "streetname_fr", "region_code", "status"])
bru2 = pd.read_csv("file2.csv", dtype={"street_id": "string", "address_id": "string"})
bru2 = bru2.fillna('')
bru2 = bru2.drop(columns=["EPSG:31370_x", "EPSG:31370_y", "EPSG:4326_lat", "EPSG:4326_lon", "postname_fr", "postname_nl", "streetname_de"])
bru2 = bru2.rename(columns={"postcode": "pkancode"})
bru2 = bru2.reindex(columns=["address_id", "box_number", "house_number", "municipality_id", "municipality_name_de", "municipality_name_fr", "municipality_name_nl", "pkancode", "street_id", "streetname_nl", "streetname_fr", "region_code", "status"])
What am I doing wrong?
I've tried other solutions from the stack that for some reason failed for me:
Error: Can only compare identically-labeled DataFrame objects
Pandas "Can only compare identically-labeled DataFrame objects" error