Parse a dataframe column by comma and pivot - python

Question

I am trying to parse out a column by the comma (also stripping the white space) and then pivoting all of the origin/destination combinations into new rows. Here is a sample of the data:

Origin     Destination     Weight
PVG        AMS, FRA        10,000
CAN, XMN   LAX, ORD        25,000

I am having trouble reproducing the dataframe above using pd.read_clipboard, so here is the dataframe code:

df = pd.DataFrame({'Origin': ['PVG', 'CAN, XMN'], 
                   'Destination': ['AMS, FRA', 'LAX, ORD'],
                   'Weight': [10000, 25000]})

The desired output would be:

Origin     Destination     Weight
PVG        AMS             10,000
PVG        FRA             10,000
CAN        LAX             25,000   
CAN        ORD             25,000
XMN        LAX             25,000
XMN        ORD             25,000

I have been trying to use:

df['Origin'].str.split(',', expand = True)

I had tried doing this for both the origin and destination, which works for parsing the strings into separate columns. I am struggling to then create all of the possible combinations into separate rows (I have tried using pivot_table and melt with no luck).

Either my answer, or jpp's, depending on what you find easier to use. — cs95, Jun 11 '18 at 02:34
@coldspeed I'm not sure if I just don't know how to modify your/jpp's answer to get what I am looking for or because the answer I am looking for is a slightly different solution? In the package example p1 matches with #111, p2 matches with #222, and so on... The problem I am trying to solve, using the package example would be: separate rows where p1 matches with #111, then #222, and then #333. Along with p2 matching with #111, then #222, then #333, and so on. So like I said, I am not sure if I don't know how to modify the answers to solve my problem or I need to find different solution? — Brian, Jun 11 '18 at 07:17
Sorry about my comment earlier, my intention was not to gain acceptance. I wanted to change your mindset that answering first isn't the criteria to use when picking the best answer. Take both solutions, test it on your data, and accept the one that works best for you. Some factors to consider: performance and readability. — cs95, Jun 11 '18 at 07:39

jezrael · Accepted Answer · 2018-06-11T07:31:56.183

Use itertools.product with list comprehenion, only first create lists with applymap:

from  itertools import product

df1 = df.applymap(lambda x: x.split(', ') if isinstance (x, str) else [x])
df2 = pd.DataFrame([j for i in df1.values for j in product(*i)], columns=df.columns)
print (df2)
  Origin Destination  Weight
0    PVG         AMS   10000
1    PVG         FRA   10000
2    CAN         LAX   25000
3    CAN         ORD   25000
4    XMN         LAX   25000
5    XMN         ORD   25000

cs95 · Answer 2 · 2018-06-11T07:40:17.643

2

You should be using itertools.product here.

from itertools import product
df_dict = {
    x[-1]: pd.DataFrame(list(
        product(x[0].split(', '), x[1].split(', ')))
    ) for x in df.values.tolist()
}

df = pd.concat(df_dict).reset_index(level=-1, drop=True).reset_index() 
df.columns = ['Weight', 'Destination', 'Origin']

df 
   Weight Destination Origin
0   10000         AMS    PVG
1   10000         FRA    PVG
2   25000         LAX    CAN
3   25000         LAX    XMN
4   25000         ORD    CAN
5   25000         ORD    XMN

edited Jun 11 '18 at 07:40

answered Jun 11 '18 at 07:19

cs95

379,657
97
704
746

@Brian Take both solutions, test it on your data, and accept the one that works best for you. Consider performance and everything else. Don't reward users (me, or anyone else) just for typing fast. – cs95 Jun 11 '18 at 07:37
Alright, standby I will time the solutions/see what works best for me – Brian Jun 11 '18 at 07:39
@Brian That's what I wanted to hear :) By the way, there were a couple of typos, I've fixed those. – cs95 Jun 11 '18 at 07:41
After further comparison, I am going to have to give it to jezrael. The code was more dynamic for what I was applying it to / trying to ultimately achieve. I appreciate the challenge to look further into the answers, you have changed my perception on correct answers. – Brian Jun 11 '18 at 07:59
1

@Brian Totally fine with that. Thanks for making a _fair_, _level-headed_ assessment of the answers. – cs95 Jun 11 '18 at 07:59

Parse a dataframe column by comma and pivot - python

2 Answers2