I have a dataframe as follows:
imagename date seqid locid
image1.jpg 16-05-2019 19:08:16 [7, 23, 29] vp1
image2.jpg 16-05-2019 19:08:17 [15, 23, 48,3798] vp1
The column seqid
contains arrays with differential length. I want to split the array and for each item in the array I want to create a new row retaining one value from the array and all other values. The desired output is as follows:
imagename date seqid locid
image1.jpg 16-05-2019 19:08:16 7 vp1
image1.jpg 16-05-2019 19:08:16 23 vp1
image1.jpg 16-05-2019 19:08:16 29 vp1
image2.jpg 16-05-2019 19:08:17 15 vp1
image2.jpg 16-05-2019 19:08:17 23 vp1
image2.jpg 16-05-2019 19:08:17 48 vp1
image2.jpg 16-05-2019 19:08:17 3798 vp1
The input file is in csv format. I understand I could split the array into multiple columns using
df.seqid.tolist(), columns=['col1', 'col2']
by reading the csv as pd.DataFrame however, I am not sure how to go about when I don't know the length of the array in the column.
I just couldn't figure out how to do this.