I would like to create a second table with the values from a column of list of strings.
I have a df:
sent_id | associations |
---|---|
1 | ["a","b","c"] |
2 | NaN |
3 | ["a"] |
4 | ["d"] |
I would like to "normalize" my data by creating a second dataframe like this:
sent_id | association |
---|---|
1 | "a" |
1 | "b" |
1 | "c" |
3 | "a" |
4 | "d" |
How do I achieve this?
I have partially arrive at my solution using this code I found in stackoverflow:
test = pd.Series(sum([item for item in df["associations"]], []))
But this doesn't preserve the sent_id