I would like to know the pandas alternative to an Excel Vlookup when it involves two dataframes and a one to many relationship. I have searched for this answer and don't think i found one that solves my use-case. below are a few post that are relevant, but not exactly what I need.
Excel VLOOKUP equivalent in pandas
vlookup between 2 Pandas dataframes
Situation:
I have two data frames and a key that links both (site). I would use the pandas merge function, but I don't want multiple records returned for one key (location B in this example).
Specifically, I want to return the status of a site from the loc_status dataframe if it exists. If the location has both a status of "inactive" and "active" then I only want to return "active".
Here is a basic example:
loc_status = [{'site':'A', 'status':'ACTIVE'}, {'site':'B', 'status':'ACTIVE'},{'site':'B', 'status':'INACTIVE'},{'site':'C', 'status':'INACTIVE'} ]
loc = [{'site':'A'}, {'site':'B'},{'site':'C'}, {'site':'D'} ]
df_status = pd.DataFrame(loc_status)
+----+-------+----------+
| | site | status |
+----+-------+----------+
| 0 | A | ACTIVE |
| 1 | B | ACTIVE |
| 2 | B | INACTIVE |
| 3 | C | INACTIVE |
+----+-------+----------+
df_loc = pd.DataFrame(loc)
+------+---+
| site | |
+------+---+
| 0 | A |
| 1 | B |
| 2 | C |
| 3 | D |
+------+---+
result = [{'site':'A', 'status':'ACTIVE'}, {'site':'B', 'status':'ACTIVE'},{'site':'C', 'status':'INACTIVE'}, {'site': 'D'}]
df_result = pd.DataFrame(result)
+----+-------+----------+
| | site | status |
+----+-------+----------+
| 0 | A | ACTIVE |
| 1 | B | ACTIVE |
| 2 | C | INACTIVE |
| 3 | D | NaN |
+----+-------+----------+
Thank you.