This is done in Python 2, and I am using Pandas. While I would LOVE to do this in SQL, which probably makes my life much easier, that is not an option.
I have two dataframes, x and y. Dataframe x has a column, 'BuyerNames' and y has, 'Buyers', and both contain a column, 'Section', among other shared columns. Entries in 'x(BuyerNames)' and 'y(Buyers) are of the same set, and data in shared columns is usually consistent. I want to create a dataframe which is a subset of y, call it y*, such that, for rows with corresponding indices, the value of 'BuyerNames' is equal to that of 'Buyers'. This dataframe is the same length, and entries not satisfying this condition contain an NaN entry.
I am creating a third dataframe, z, and it has the union of the columns of x and y. My last goal is to populate z with the entries of 'Section' contained in y', preserving index.
I am very new to databases, hence what is probably a very confusing description of what is likely very simple. How do I achieve this in an efficient manner? There is currently a solution that uses a for loop in Python, which is very slow and not scaleable. I cannot post code, sadly.