1

I have 1 Dataframe with some rows of Sold-To Country Name column in the value of Not: XX XX XX,which means the rest of Sold-To Country Codes except XX XX XX will be reporting to the mapped Reporting Country.

Another requirement is that if Sold-To Country Code is null (or NaN), it will capture all revenue from all country codes in that SalesOrg.

df_mapping = pd.DataFrame({'SalesOrg Code':['0001','0002','0002','0002','0002'],
                           'Reporting Country':['Spain','UK','UK','UK','Netherlands'],
                           'Sold-To Country Code':[np.nan,'IE','FR','IT','Ex:'],
                           'Sold-To Country Name':[np.nan,'Ireland','France','Italy','NOT: FR IE IT']})
SalesOrg Code   Reporting Country   Sold-To Country Code    Sold-To Country Name
0001            Spain                null                   null
0002            UK                   IE                     Ireland
0002            UK                   FR                     France
0002            UK                   IT                     Italy
0002            Netherlands          Ex:                    NOT: FR IE IT
.......

There will be another Dataframe with a full list of global country codes, where we can search for the rest of the country codes.

Example of the Dataframe:

df_countrylist = pd.DataFrame(["AF", "AX", "AL", "DZ", "AS", "AD", "AO", "AI", "AQ", "AG", "AR",
"AM", "AW", "AU", "AT", "AZ", "BS", "BH", "BD", "BB", "BY", "BE",
"BZ", "BJ", "BM", "BT", "BO", "BQ", "BA", "BW", "BV", "BR", "IO",
"BN", "BG", "BF", "BI", "CV", "KH", "CM", "CA", "KY", "CF", "TD",
"CL", "CN", "CX", "CC", "CO", "KM", "CG", "CD", "CK", "CR", "CI",
"HR", "CU", "CW", "CY", "CZ", "DK", "DJ", "DM", "DO", "EC", "EG",
"SV", "GQ", "ER", "EE", "ET", "FK", "FO", "FJ", "FI", "FR", "GF",
"PF", "TF", "GA", "GM", "GE", "DE", "GH", "GI", "GR", "GL", "GD",
"GP", "GU", "GT", "GG", "GN", "GW", "GY", "HT", "HM", "VA", "HN",
"HK", "HU", "IS", "IN", "ID", "IR", "IQ", "IE", "IM", "IL", "IT",
"JM", "JP", "JE", "JO", "KZ", "KE", "KI", "KP", "KR", "KW", "KG",
"LA", "LV", "LB", "LS", "LR", "LY", "LI", "LT", "LU", "MO", "MK",
"MG", "MW", "MY", "MV", "ML", "MT", "MH", "MQ", "MR", "MU", "YT",
"MX", "FM", "MD", "MC", "MN", "ME", "MS", "MA", "MZ", "MM", "NA",
"NR", "NP", "NL", "NC", "NZ", "NI", "NE", "NG", "NU", "NF", "MP",
"NO", "OM", "PK", "PW", "PS", "PA", "PG", "PY", "PE", "PH", "PN",
"PL", "PT", "PR", "QA", "RE", "RO", "RU", "RW", "BL", "SH", "KN",
"LC", "MF", "PM", "VC", "WS", "SM", "ST", "SA", "SN", "RS", "SC",
"SL", "SG", "SX", "SK", "SI", "SB", "SO", "ZA", "GS", "SS", "ES",
"LK", "SD", "SR", "SJ", "SZ", "SE", "CH", "SY", "TW", "TJ", "TZ",
"TH", "TL", "TG", "TK", "TO", "TT", "TN", "TR", "TM", "TC", "TV",
"UG", "UA", "AE", "GB", "US", "UM", "UY", "UZ", "VU", "VE", "VN",
"VG", "VI", "WF", "EH", "YE", "ZM", "ZW"])

Ultimately, I want to have like this:

SalesOrg Code   Reporting Country   Sold-To Country Code    Sold-To Country Name
0001            Spain                null (all)             null
0002            UK                   IE                     Ireland
0002            UK                   FR                     France
0002            UK                   IT                     Italy
0002            Netherlands          AT                     Austria 
0002            Netherlands          DK                     Denmark 
0002            Netherlands          NL                     Netherlands 
0002            Netherlands          BE                     Belgium 
0002            Netherlands          LT                     Lithuania 
0002            Netherlands          LX                     Latvia      
.......

For SalesOrg #0002, if the Sold-To Country Code are not FR IE IT, the rest will be reporting to Netherlands. So I want to create rows for the rest of the country codes.

Is there any better way to create rows and expand into the existing Dataframe?

Xavier Sun
  • 79
  • 3
  • 8
  • Your question is closely related (but not exactly the same) to [Split cell into multiple rows in pandas dataframe](https://stackoverflow.com/questions/50731229/split-cell-into-multiple-rows-in-pandas-dataframe) – Itay Aug 22 '19 at 06:34

2 Answers2

0

If you have Pandas 0.25, then Explode is what you need:

import numpy as np
import pandas as pd

df_mapping = pd.DataFrame({'SalesOrg Code':['0001','0002','0002','0002','0002'],
                           'Reporting Country':['Spain','UK','UK','UK','Netherlands'],
                           'Sold-To Country Code':[np.nan,'IE','FR','IT','Ex:'],
                           'Sold-To Country Name':[np.nan,'Ireland','France','Italy','NOT: FR IE IT']})

country_set = {"AF", "AX", "AL", "DZ", "AS", "AD", "AO", "AI", "AQ", "AG", "AR",
"AM", "AW", "AU", "AT", "AZ", "BS", "BH", "BD", "BB", "BY", "BE",
"BZ", "BJ", "BM", "BT", "BO", "BQ", "BA", "BW", "BV", "BR", "IO",
"BN", "BG", "BF", "BI", "CV", "KH", "CM", "CA", "KY", "CF", "TD",
"CL", "CN", "CX", "CC", "CO", "KM", "CG", "CD", "CK", "CR", "CI",
"HR", "CU", "CW", "CY", "CZ", "DK", "DJ", "DM", "DO", "EC", "EG",
"SV", "GQ", "ER", "EE", "ET", "FK", "FO", "FJ", "FI", "FR", "GF",
"PF", "TF", "GA", "GM", "GE", "DE", "GH", "GI", "GR", "GL", "GD",
"GP", "GU", "GT", "GG", "GN", "GW", "GY", "HT", "HM", "VA", "HN",
"HK", "HU", "IS", "IN", "ID", "IR", "IQ", "IE", "IM", "IL", "IT",
"JM", "JP", "JE", "JO", "KZ", "KE", "KI", "KP", "KR", "KW", "KG",
"LA", "LV", "LB", "LS", "LR", "LY", "LI", "LT", "LU", "MO", "MK",
"MG", "MW", "MY", "MV", "ML", "MT", "MH", "MQ", "MR", "MU", "YT",
"MX", "FM", "MD", "MC", "MN", "ME", "MS", "MA", "MZ", "MM", "NA",
"NR", "NP", "NL", "NC", "NZ", "NI", "NE", "NG", "NU", "NF", "MP",
"NO", "OM", "PK", "PW", "PS", "PA", "PG", "PY", "PE", "PH", "PN",
"PL", "PT", "PR", "QA", "RE", "RO", "RU", "RW", "BL", "SH", "KN",
"LC", "MF", "PM", "VC", "WS", "SM", "ST", "SA", "SN", "RS", "SC",
"SL", "SG", "SX", "SK", "SI", "SB", "SO", "ZA", "GS", "SS", "ES",
"LK", "SD", "SR", "SJ", "SZ", "SE", "CH", "SY", "TW", "TJ", "TZ",
"TH", "TL", "TG", "TK", "TO", "TT", "TN", "TR", "TM", "TC", "TV",
"UG", "UA", "AE", "GB", "US", "UM", "UY", "UZ", "VU", "VE", "VN",
"VG", "VI", "WF", "EH", "YE", "ZM", "ZW"}

Then find where you need to "expand" out into new rows:

mask = df_mapping['Sold-To Country Name'].str.contains("NOT:").fillna(False)
df_mapping.loc[mask, 'Sold-To Country Code'] = df_mapping.loc[mask, 'Sold-To Country Name'].apply(lambda x: list(country_set - set(x.split(" ")[1:])))
df_mapping = df_mapping.explode('Sold-To Country Code')
df_mapping.reset_index(drop=True, inplace=True)

Then of course you'll have to match up country codes with country names using another data frame.

Ted
  • 1,189
  • 8
  • 15
  • 1
    Hi @Tom - Really appreciated. I got Pandas 0.25 so the `Explode` is working well as by @mohanys. Thanks anyway! – Xavier Sun Aug 22 '19 at 18:54
0

Check if this gets you what you want.

df1['a'] = df1['Sold-To Name'].replace(regex=r'NOT:', value='').str.split(" ")
df1['a']= df1['a'].apply (lambda x : (np.setdiff1d(c_list,x)))
df1.loc[ df1['Sold-To Code']== 'Ex:', 'Sold-To Code' ] = df1['a']
df1=df1.explode('Sold-To Code')
df1.drop('a',axis=1,inplace=True)

Details

 df1['a'] = df1['Sold-To Name'].replace(regex=r'NOT:', value='').str.split(" ")

With this we arecreating a new column a, copying the contents of column Sold-To Name, removing test 'NOT:' splitting the rest into a list

df1['a']= df1['a'].apply (lambda x : (np.setdiff1d(c_list,x)))

With this we are comparing the contents of each row in a with the contries list (c_list should be a list, not a dataframe like you have).

df1.loc[ df1['Sold-To Code']== 'Ex:', 'Sold-To Code' ] = df1['a']

with this we taking the rows of Sold-To Code column where the value is 'Ex:' are replacing it with the contents of column a (which is the list of countries other that the ones in Sold-To Name

df1=df1.explode('Sold-To Code')

in pandas 0.25 we have pandas.DataFrame.explode which could be used to make each item in a list of a column to a separate row (rest of the columns will be copied as it is)

df1.drop('a',axis=1,inplace=True)

We drop column a since we do not need it any more.

It is working fine for me, check at your end & see if this works for you as well.

moys
  • 7,747
  • 2
  • 11
  • 42
  • Hi @mohanys - this is brilliant!! It is working very well!. Just one more question, if the country_list change to be a dframe table with two columns, `Country_Cd` and `Country_Name`, a mapping table. Wondering how we can replace the `NOT: FR IE IT` with the mapped country name for each row expanded? I know we probably can use `.merge` by `left join` and then `np.max` to grab the existing `Country_name` if they are not `NOT: FR IE IT`, and then `.drop` the previous `Country_name` column, but this approach is a bit lengthy, so wondering if there is any easier way to achieve this? – Xavier Sun Aug 22 '19 at 12:35
  • Check this question https://stackoverflow.com/questions/50561577/mapping-column-value-based-on-another-column-in-python – moys Aug 22 '19 at 12:49
  • Thank you @mohanys. That link is working for this purpose. It is way much better! Appreciated for your help! :) – Xavier Sun Aug 22 '19 at 18:13
  • Hi @mohanys - I forgot to highlight, not sure if you encountered this error `ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()` after `df1.loc[ df1['Sold-To Code']== 'Ex:', 'Sold-To Code' ] = df1['a']'`. I used `try except ValueError: pass` function to ignore this error, which is working fine. Any ways to fix this error? – Xavier Sun Aug 22 '19 at 18:51