Cleaning up sharepoint list for upload to mssql with proper table relationships.
Basically, two dataframes (data, config), both share some common columns (country, business). What I want to do is to insert a new column in datadf where for each row it contains index of matching row in configdf based on values in columns country and business.
dataframe data:
-----|---------|----------|-----
... | Country | Business | ...
-----|---------|----------|-----
| A | 1 |
-----|---------|----------|-----
| A | 1 |
-----|---------|----------|-----
| A | 2 |
-----|---------|----------|-----
| A | 2 |
-----|---------|----------|-----
| B | 1 |
-----|---------|----------|-----
| B | 1 |
-----|---------|----------|-----
| B | 2 |
-----|---------|----------|-----
| C | 1 |
-----|---------|----------|-----
| C | 2 |
-----|---------|----------|-----
dataframe config (ID = index):
----|---------|----------|-----
ID | Country | Business | ...
----|---------|----------|-----
1 | A | 1 |
----|---------|----------|-----
2 | A | 2 |
----|---------|----------|-----
3 | B | 1 |
----|---------|----------|-----
4 | B | 2 |
----|---------|----------|-----
5 | C | 1 |
----|---------|----------|-----
6 | C | 2 |
----|---------|----------|-----
what I want to add to dataframe data:
-----|---------|----------|-----------|-----
... | Country | Business | config_ID | ...
-----|---------|----------|-----------|-----
| A | 1 | 1 |
-----|---------|----------|-----------|-----
| A | 1 | 1 |
-----|---------|----------|-----------|-----
| A | 2 | 2 |
-----|---------|----------|-----------|-----
| A | 2 | 2 |
-----|---------|----------|-----------|-----
| B | 1 | 3 |
-----|---------|----------|-----------|-----
| B | 1 | 3 |
-----|---------|----------|-----------|-----
| B | 2 | 4 |
-----|---------|----------|-----------|-----
| C | 1 | 5 |
-----|---------|----------|-----------|-----
| C | 2 | 6 |
-----|---------|----------|-----------|-----
----Found something that works----
datadf['config_ID'] = datadf.apply(lambda x: configdf[(configdf.country == x.country) & (configdf.business_unit == x.business_unit)].index[0], axis=1)
It gets the job done, although I am open for other suggestions, especially if it could work with df.insert()