1

I have looked at other explanations here and none quite fit. The unpacking code works, but its very slow. is there a better method that i can use with python. I can't see how i can get a list comprehension to work here. Any suggestions would be very helpful.

import pandas as pd

df = pd.DataFrame(data={'a':['A1 + A3','B4 + A4 + D2','C2 + D2'],'b':['L700 + 
          L800','G700','L2600 + L900'],'c':['6','7','8']})
df

    a   b   c
0   A1 + A3 L700 + L800     6
1   B4 + A4 + D2    G700    7
2   C2 + D2 L2600 + L900    8

df2 = pd.DataFrame(columns = df.columns)
for index, row in df.iterrows():
    userLabel = row.loc['a']
    cells = userLabel.split('+') 

    ID = row.loc['b']
    tech = ID.split('+')     
    i = 0
    for cell in cells:

        cell = cell.strip()
        row.loc['a'] = cell
        if i > len(tech)-1:
            i = i-1
        row.loc['b'] = tech[i]
        df2.loc[len(df2)] = row
        i += 1

df2

    a   b   c
0   A1  L700    6
1   A3  L800    6
2   B4  G700    7
3   A4  G700    7
4   D2  G700    7
5   C2  L2600   8
6   D2  L900    8
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
Mick Hawkes
  • 47
  • 2
  • 9
  • Welcome to StackOverflow. Please take the time to read this post on [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on [how to ask a good question](http://stackoverflow.com/help/how-to-ask) may also be useful. – jezrael Jun 13 '18 at 05:46
  • 1
    finally. Is that ok? I wish the output looked better – Mick Hawkes Jun 13 '18 at 06:05
  • It is super, thank you. – jezrael Jun 13 '18 at 06:13

1 Answers1

1

Use:

df = (df.set_index('c')
        .stack()
        .str.split('\s+\+\s+', expand=True)
        .stack()
        .unstack(1)
        .ffill()
        .reset_index(level=1, drop=True)
        .reset_index()
        .reindex(columns=['a','b','c'])
        )
print (df)
    a      b  c
0  A1   L700  6
1  A3   L800  6
2  B4   G700  7
3  A4   G700  7
4  D2   G700  7
5  C2  L2600  8
6  C2   L900  8

Explanation:

  1. First set_index for split columns for split
  2. Reshape by stack for Series with MultiIndex
  3. Create DataFrame per rows by split by regex - nees escape + and for one or more whitespaces use \s+
  4. Reshape by stack and unstack for second level to columns
  5. Forward fill NaNs by ffill
  6. Data cleaning by reset_index
  7. If order of columns important add reindex
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • @MartinThoma - I start do it, but many functions, so I hope in 3 minutes it will be done – jezrael Jun 13 '18 at 06:32
  • @MartinThoma - Done. – jezrael Jun 13 '18 at 06:36
  • Wow! thanks Jezreal! Im going to have to go through that line by line. I thought i was getting the hang of python (old VBA guru) but you have shown me i still have a ways to go. – Mick Hawkes Jun 13 '18 at 21:35
  • @jezrael - when I run this in my code I get "ValueError: Index contains duplicate entries, cannot reshape" the df I am using has more columns than my example above. What should I be looking for to try and fix this? – Mick Hawkes Jun 14 '18 at 00:03
  • @jezrael - it occurs on .unstack(1) – Mick Hawkes Jun 14 '18 at 00:14
  • 1
    @jezrael - I fixed it by creating a unique index df['new'] = range(1, len(df.index)+1) as my dataframe did not contain any unique valuecolumns :) – Mick Hawkes Jun 14 '18 at 00:48