0

I want to split the record if it contains / below into 2 rows with the same volume and premium values but with group values of 4942 and 5350. The second transformation

Current dataframe
                  Group     Volume    Premium
                 4941.0     5721.0    5057.76
              4942/5350     6154.0    5462.46

transformed
                  Group     Volume    Premium
                 4941.0     5721.0    5057.76
                   4942     6154.0    5462.46
                   5350     6154.0    5462.46

I tried the code below and the first record group value should be 4941 but it returns nan instead

     dfRosters=(dfRosters.set_index(['Volume', 'Premium'])
     .apply(lambda x: x.str.split('/').explode())
    .reset_index())

  Volume    Premium           Group
0      5721.0    5057.76             NaN
1      6154.0    5462.46            4942
2      6154.0    5462.46            5350

34653120
  • 95
  • 6
  • 1
    `.explode()` is what you are looking for: [Split (explode) pandas dataframe string entry to separate rows](https://stackoverflow.com/questions/12680754/split-explode-pandas-dataframe-string-entry-to-separate-rows) – JNevill Jul 18 '22 at 14:34

4 Answers4

0

This should work:

df = pd.DataFrame({"group": ["4941.0", "4942/5350"], "volume": ["5721.0", "6154.0"], "Premium": ["5057.76", "5462.46"]})
df['group'] =  df['group'].apply(lambda x: x.split("/"))
df = df.explode("group")

The explode method converts a list into a separate rows and duplicates the other rows for each item in the list.

  • it returns this error AttributeError: 'float' object has no attribute 'split' – 34653120 Jul 18 '22 at 14:51
  • Since you had mentioned "4942/5350" in your original post, I assumed it was a string. This is because no numerical type can be saved as 4942/5350 (AFAIK). Nonetheless, I see below that you managed to solve it by formatting it as string. You can always change it to float by doing df[col].astype('float') – Murtaza Samiwala Jul 18 '22 at 20:09
0

Use str.split for vectorial and error-proof splitting, then explode:

out = (df.assign(**{'Group': df['Group'].str.split('/')})
         .explode('Group', ignore_index=True)
      )

output:

    Group  Volume  Premium
0  4941.0  5721.0  5057.76
1    4942  6154.0  5462.46
2    5350  6154.0  5462.46

To also convert the types:

out = (df
   .assign(**{'Group': df['Group'].str.split('/')})
   .explode('Group', ignore_index=True)
   .astype({'Group': float})
)

output:

    Group  Volume  Premium
0  4941.0  5721.0  5057.76
1  4942.0  6154.0  5462.46
2  5350.0  6154.0  5462.46
mozway
  • 194,879
  • 13
  • 39
  • 75
  • this does not split the records, it output is exactly the same as input – 34653120 Jul 18 '22 at 15:01
  • I checked again, and it works fine. Note that I used `df` as name, you should use `dfRosters`. What is the output of `dfRosters.to_dict('list')`? – mozway Jul 18 '22 at 15:03
0
    dfRosters['Group'] = dfRosters['Group'].astype(str)
    dfRosters=(dfRosters.set_index(['Volume', 'Premium'])
    .apply(lambda x: x.str.split('/').explode())
    .reset_index())

Had to format group as string to resolve the issue

34653120
  • 95
  • 6
0

You can also spit on the string and explode that single series. Then join back with the original frame.

out = (
    df['Group'].str.split('/')
    .explode()
    .astype(float)
    .to_frame()
    .join(df, rsuffix='_orig')
)

print(out)
    Group Group_orig  Volume  Premium
0  4941.0     4941.0  5721.0  5057.76
1  4942.0  4942/5350  6154.0  5462.46
1  5350.0  4942/5350  6154.0  5462.46
Cameron Riddell
  • 10,942
  • 9
  • 19