1

My pandas data is currently in this format:

Uid     Postcodelist
1       NE11 7HS,NE5 8MN,NE1 7UJ,NE14 8YU
2       LS6 8PJ
3       M6 7JH,M14 1HF
4       B17 8KA,LE5 7UZ,LE9 9GF

I have a problem where I need to split the Postcodelist column into two columns P1, P2 and take the following ordinal positions of the strings. For example for Uid=1:

[1,2,3,4]

would be split to:

[1],[2]

[3],[4]

this is the expected output I need:

Uid     P1           P2
1       NE11 7HS     NE5 8MN
1       NE1 7UJ      NE14 8YU
2       LS6 8PJ      Null
3       M6 7JH       M14 1HF
4       B17 8KA      LE5 7UZ
4       LE9 9GF      Null

If there were more than 4 items it would need to split and explode further (I don't think there would be a defined upper limit but it would need to explode for every 2 items in the string list).

I tried this a while back and someone answered and gave the following solution which got me some of the way there. There is an increased need now for this data to be in the above described format. The code which got me some of the way there is below. (I'm using the explode() function described here. Split (explode) pandas dataframe string entry to separate rows

df[['P1','P2']] = df.pop('PreviousPostCodes').str.split(',\s*', n=1, expand=True)
df['P2'] = df['P2'].fillna('').str.split(',\s*', expand=False)
df = explode(df, lst_cols='P2')

this gets me (again for example uid=1)

[1,2,3,4]

transformed to

[1],[2]

[1],[3]

[1],[4]
smci
  • 32,567
  • 20
  • 113
  • 146
CheyRav90
  • 65
  • 1
  • 6

1 Answers1

4

You can use:

df = df.set_index('Uid').pop('PreviousPostCodes').str.split(',\s*', expand=True)
df.columns = [df.columns % 2 + 1, df.columns // 2]
df = df.stack().add_prefix('P').reset_index(level=1, drop=True).reset_index()
print (df)
   Uid        P1        P2
0    1  NE11 7HS   NE5 8MN
1    1   NE1 7UJ  NE14 8YU
2    2   LS6 8PJ      None
3    3    M6 7JH   M14 1HF
4    4   B17 8KA   LE5 7UZ
5    4   LE9 9GF      None

Explanation:

  1. Create index from column Uid by set_index and split to DataFrame
  2. Create MultiIndex in columns by floor and modulo division
  3. Reshape by stack
  4. Change columns names by add_suffix
  5. Last reset_index for remove first level of MultiIndex and column from index
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252