My pandas data is currently in this format:
Uid Postcodelist
1 NE11 7HS,NE5 8MN,NE1 7UJ,NE14 8YU
2 LS6 8PJ
3 M6 7JH,M14 1HF
4 B17 8KA,LE5 7UZ,LE9 9GF
I have a problem where I need to split the Postcodelist
column into two columns P1, P2
and take the following ordinal positions of the strings. For example for Uid=1:
[1,2,3,4]
would be split to:
[1],[2]
[3],[4]
this is the expected output I need:
Uid P1 P2
1 NE11 7HS NE5 8MN
1 NE1 7UJ NE14 8YU
2 LS6 8PJ Null
3 M6 7JH M14 1HF
4 B17 8KA LE5 7UZ
4 LE9 9GF Null
If there were more than 4 items it would need to split and explode further (I don't think there would be a defined upper limit but it would need to explode for every 2 items in the string list).
I tried this a while back and someone answered and gave the following solution which got me some of the way there. There is an increased need now for this data to be in the above described format. The code which got me some of the way there is below. (I'm using the explode()
function described here. Split (explode) pandas dataframe string entry to separate rows
df[['P1','P2']] = df.pop('PreviousPostCodes').str.split(',\s*', n=1, expand=True)
df['P2'] = df['P2'].fillna('').str.split(',\s*', expand=False)
df = explode(df, lst_cols='P2')
this gets me (again for example uid=1)
[1,2,3,4]
transformed to
[1],[2]
[1],[3]
[1],[4]