As you are new, here are some hints for asking such a question:
As pointed out in the comments, you should add a minimal example such as below, that everybode can import, look at and play around with.
Also you should provide the ways you already tried to do so.
First of, wildcard merges to my knowledge are not possible, therefore data cleaning is required. I did so with the sales['parcel_id_new']
column that is just the ìd
part of the string.
Then you use a left join merge
, as it takes all entries from the first left
table and matches all from right
leaving NAN
in all others
parcels = pd.DataFrame([
{'parcel': '01 01234566789AB', 'other column': 'Package A'},
{'parcel': '02 01234566789AB', 'other column': 'Package B'},
])
print(parcels)
sales = pd.DataFrame([
{'parcel_id': '01 01234566789AB OTHER UNIMORTANT TEXT', 'other sales column': 'Sales A'},
])
sales['parcel_id_new'] = sales['parcel_id'].apply(lambda x: x[:16])
print(sales)
full_table = pd.merge(how='left',
left= parcels,
right=sales,
left_on='parcel', right_on='parcel_id_new')
print(full_table)
Output:
PARCELS:
parcel other column
0 01 01234566789AB Package A
1 02 01234566789AB Package B
SALES:
parcel_id other sales column parcel_id_new
0 01 01234566789AB OTHER UNIMORTANT TEXT Sales A 01 01234566789AB
RESULT:
parcel other column parcel_id other sales column parcel_id_new
0 01 01234566789AB Package A 01 01234566789AB OTHER UNIMORTANT TEXT Sales A 01 01234566789AB
1 02 01234566789AB Package B NaN NaN NaN