I have a dataframe of lists that looks similar to the one below (fig a). There is a single key column followed by n columns containing lists. My goal is that for each row, I will combine the lists from each column (excluding the key) into a single list in the new column, combined
. An example of my desired result is below in figure B.
I've tried some methods with iteritems(), but these dataframes have the potential to be hundreds of thousands to millions of rows long which made it incredibly slow. So I am trying to avoid solutions that use that.
I'd like to use something like the list comprehension seen in this SO post, but I haven't been able to get it working with pandas.
# example data
data = {'key': ['1_1', '1_2', '1_3'],
'valueA': [[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18]],
'valueB': [[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18]],
'valueN': [[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18]]}
dataSet = pd.DataFrame(data)
Figure A
Figure B
Edit: I really appreciate all the answers I have gotten so far! I'm currently going through and timing each on my full size dataset so I can figure out which one will work best this this case. I'll update with my result shortly!
Edit 2: I tested the main solutions provided here on a few of my larger datasets and their average times are below.
# Lambda/Apply a nested list comprehension
shakiba.mrd: 1.12 s
# Sum columns
jfaccioni: 2.21 s
# Nested list comprehension with iterrows
mozway: 0.95 s
# Adding column lists together
politinsa: 3.50 s
Thanks again to everyone for their contributions!