I have two pandas dataframes: key_df
and value_df
key_dict = {"coordinates": ["AB1", "AC1", "AD1", "EF1", ... ], "start": [762, 1274, 1587, 1991, ...], "end": [2481, 1789, 1689, 2211, ...] }
key_df = pd.DataFrame(key_dict)
coordinates start end
0 AB1 762 2481
1 AC1 1274 1789
2 AD1 1587 1689
3 EF1 1991 2211
... ... ... ...
value_dict = {"coordinates": ["AD1", "AB1"], "meta_data": [101, 010]}
value_df = pd.DataFrame(value_dict)
coordinates meta_data
0 AD1 101
1 AB1 110
... ... ...
The coordinates
column for key_df
only contains unique values---there are no repeats. Similarly for value_df
.
I would like to iterate through value_df
on coordinates
, checking each value of coordinates
in value_df
for the value of coordinates
in key_df
. I would then like to return start
and end
for these values.
My thought to subset the dataframe and grab the start
and end
values would be to create a function:
def parse(x, df): ### 'x' is each row of value_df$coordinates
df = df[df.coordinates == x]
return (df.start, df.end) ## return as a tuple
and I would call the function be parse(x, df=key_df)
However, I'm not sure how to iterate over value_df
. .iterrows()
is quick but it doesn't preserve the row dtype, which may be a problem.