My data in ddata.csv is as follows:
col1,col2,col3,col4
A,10,a;b;c, 20
B,30,d;a;b,40
C,50,g;h;a,60
I want to separate col3 into multiple columns, but based on their values. In other wants, I would like my final data to look like
col1, col2, name_a, name_b, name_c, name_d, name_g, name_h, col4
A, 10, a, b, c, NULL, NULL, NULL, 20
B, 30, a, b, NULL, d, NULL, NULL, 40
C, 50, a, NULL, NULL, NULL, g, h, 60
My code, at the moment taken reference from this answer, is incomplete:
import pandas as pd
import string
L = list(string.ascii_lowercase)
names = dict(zip(range(len(L)), ['name_' + x for x in L]))
df = pd.read_csv('ddata.csv')
df2 = df['col3'].str.split(';', expand=True).rename(columns=names)
Column names 'a','b','c' ... are taken at random, and has no relevance to the actual data a,b,c.
Right now, my code can just split 'col3' into three columns as follows:
name_a name_b name_c
a b c
d e f
g h i
But, it should be like name_a, name_b, name_c, name_d, name_g, name_h a, b, c, NULL, NULL, NULL a, b, NULL, d, NULL, NULL a, NULL, NULL, NULL, g, h
and in the end, I need to just replace col3 with these multiple columns.