Apologies in advance if my dataframe sample formatting is terrible, this is my first question and I'm also a novice with python.
I have a dataset that includes a list of items and tags associated with a particular item, there are other items in the set that are derived from the main item. These derived items do not currently have tags in their rows, but they should have the same tags as their parent item. There is an ID that I can use to link them, but they are not located in the same column or row (see example). These items are also within a bucket and the dataset may have multiple buckets, all with similar situations. I have successfully achieved the end result in an excel mockup using multiple worksheets and index match + conditions, but I can't figure out how to do this match across different rows using pandas.
Sample data set, there are many more columns than this, but I think this should work as an example.
Bucket | Main Item ID | Item Name | Item Tag 1 | Item Tag 2 | Item Tag 3 | Derived Item ID |
---|---|---|---|---|---|---|
26 | 123 | Item A | 50 | 1000 | 250 | NaN |
26 | 765 | Item A (Derived) | NaN | NaN | NaN | 123 |
So I want to get the tags from Item 123 (50, 1000, 250) to replicate on any other item that has a matching item ID in the Derived Item ID column (there can be multiple).
Bucket | Main Item ID | Item Name | Item Tag 1 | Item Tag 2 | Item Tag 3 | Derived Item ID |
---|---|---|---|---|---|---|
26 | 123 | Item A | 50 | 1000 | 250 | NaN |
26 | 765 | Item A (Derived) | 50 | 1000 | 250 | 123 |
I originally tried to use a for loop to run through the rows and add any tags and Main Item IDs to their own dictionaries, then create a reference dataframe to match from, but I ran into issues with that. I've also tried to figure out if I could use isin() and np.where() to try and do all of the filling in one go, but again no luck.
I've been searching for related topics/questions for a few days and couldn't find anything similar. People are always trying to match across dataframes which ends up being a simple merge or something with df.loc, but I'd like to do this within the frame itself.
If this is best done by creating a separate dataframe I'm open to that, and any help or tips are greatly appreciated.