I have two pandas DataFrames, df1
and df2
.
As shown in the first DataFrame df1
, the entries in the column "Box", which show the assignment of different boxes to a hand of people, can contain RegEx-values:
df1:
Person Box
0 Alex Box 1
1 Linda Box 3
2 David Box .*
3 Rachel Box [1-2]
df2:
Box Item Qty.
0 Box 1 Apple 4
1 Box 1 Blueberry 12
2 Box 2 Lemon 1
3 Box 2 Papaya 2
4 Box 3 Apple 2
How do I join the DataFrames on the common column "Box" in a way that these regEx-values are evaluated correctly (as regex are not supported in pandas' join / merge-functions), so that I receive the following DataFrame as a result:
Person Box Item Qty.
0 Alex Box 1 Apple 4
1 Alex Box 1 Blueberry 12
2 Linda Box 3 Apple 2
3 David Box 1 Apple 4
4 David Box 1 Blueberry 12
5 David Box 2 Lemon 1
6 David Box 2 Papaya 2
7 David Box 3 Apple 2
8 Rachel Box 1 Apple 4
9 Rachel Box 1 Blueberry 12
10 Rachel Box 2 Lemon 1
11 Rachel Box 2 Papaya 2
I already tried to achieve this via list comprehensions, which leads to correct results, but drops columns of the left DataFrame.
def joinWithRegEx(left: pd.DataFrame, right: pd.DataFrame, left_on: str, right_on: str):
df = pd.DataFrame
df = pd.concat([right[right[right_on].str.match(entry)] for entry in left[left_on]], ignore_index=True)
'''
Left-Join of two DataFrame with considered Rege
'''
return df
I would have actually thought that this would be a more common usecase, or is Pandas simply not the best choice for these kind if tasks?