17

I have a python pandas dataframe df like this:

a  b
1  3
3  6
5  7
6  4
7  8

I want to transfer it to a list:

[(1,3),(3,6),(5,7),(6,4),(7,8)]

Thanks.

cs95
  • 379,657
  • 97
  • 704
  • 746
kkjoe
  • 745
  • 2
  • 8
  • 18
  • 1
    Does this answer your question? [Pandas convert dataframe to array of tuples](https://stackoverflow.com/questions/9758450/pandas-convert-dataframe-to-array-of-tuples) – AMC Jan 07 '20 at 16:03

3 Answers3

31

If performance is important, use a list comprehension:

[tuple(r) for r in df.to_numpy()]
# [(1, 3), (3, 6), (5, 7), (6, 4), (7, 8)]

Note: For pandas < 0.24, please use df.values instead.

You may find even better performance if you iterate over lists instead of the numpy array:

[tuple(r) for r in df.to_numpy().tolist()]
# [(1, 3), (3, 6), (5, 7), (6, 4), (7, 8)]

This method to any number of columns. However, if you want to select a specific set of columns to convert, you can select them beforehand.

[tuple(r) for r in df[['a', 'b']].to_numpy()]
# [(1, 3), (3, 6), (5, 7), (6, 4), (7, 8)]

Another alternative is using map.

list(map(tuple, df.to_numpy()))
# [(1, 3), (3, 6), (5, 7), (6, 4), (7, 8)]

This is roughly the same as the list comprehension, performance wise. You can generalise the same way.


Another option is to use apply and convert the result to a list:

df.apply(tuple, axis=1).tolist()
# [(1, 3), (3, 6), (5, 7), (6, 4), (7, 8)]

This is slower, so it not recommended.

cs95
  • 379,657
  • 97
  • 704
  • 746
  • I don't know how recent it is, but the Pandas docs discourage the use of `.values` in favour of `.to_numpy()`. – AMC Jan 07 '20 at 16:05
  • @AMC `to_numpy()` was introduced in 0.24, almost 2 years after this answer was written. I know because [I broke the news](https://stackoverflow.com/a/54508052). Obviously this hasn't been updated since then but just a comment would've sufficed :-) – cs95 Jan 07 '20 at 19:28
  • _I know because I broke the news._ You broke the news, and apparently I had upvoted it! I actually noticed only a few minutes ago from reading your profile that you keep track of some reference Pandas questions etc. Keep up the good work on that front! _but just a comment would've sufficed_ Are you referring to the downvote? I think I downvoted because this can now be closed as a duplicate of a far more popular question, although I'm unsure how answer score affects the placement of the question on the site. – AMC Jan 07 '20 at 19:37
  • Also, I'm surprised you didn't submit an edit for the top answer to that question. – AMC Jan 07 '20 at 19:38
  • @AMC I assumed the downvote was from you on account of the answer being outdated. Apologies. You're free to vote as you see fit and no one can tell you otherwise. Which answer are you referring to being edited, by the way? AFAICT [I did edit Wes' answer](https://stackoverflow.com/posts/9762084/revisions). – cs95 Jan 07 '20 at 19:47
  • Sorry, I confused two different questions. The answer I am referring to is here: https://stackoverflow.com/q/13187778/11301900. As an aside, I'm starting to notice that there are multiple extremely similar questions on converting a Pandas DataFrames/Series to a NumPy array/list. It's so frustrating! I wonder if it might be even be better to close/remove those posts entirely, since they're so trivial. – AMC Jan 07 '20 at 19:50
  • I forgot to add: Not only are they trivial, I think they also tend to reinforce people's misconceptions/lack of knowledge of Pandas, its structures and its use cases. – AMC Jan 07 '20 at 19:53
  • 1
    @AMC You think these are trivial because you're proficient with the tools and the language. The same cannot be said of beginners, who will need to have this information readily available when they look for it. I agree there's a larger issue of duplicate questions constantly asked but that's a different issue. When you first start learning a language you need to quickly figure out how to do basic things, and having a SO post as the top hit on a google search is 10x more effective than pages of documentation that will take you minutes to scour through to find the answer. – cs95 Jan 07 '20 at 19:53
  • @AMC: I didn't edit that answer because I wanted to post mine ;-)) FWIW I did edit the question to make the top answer fit; it wasn't initially. – cs95 Jan 07 '20 at 19:54
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/205553/discussion-between-amc-and-cs95). – AMC Jan 07 '20 at 19:58
  • @AMC would love to, bit busy with work now. Ping me either in chat or here and I'll reply when I can. Cheers. – cs95 Jan 07 '20 at 19:59
  • Of course, no problem! Is there anywhere else that would be more convenient for you (I've used Discord with a few people from SO, for example), or is the chat fine? _Ping me either in chat_ Done! :) – AMC Jan 07 '20 at 20:19
5

You can also get the desired list like that:

zip(list(df['a']), list(df['b']))
dvitsios
  • 458
  • 2
  • 7
  • 1
    @Leonid yes, that's because `zip` is lazy and doesn't actually return anything until you exhaust it. – cs95 Jun 06 '19 at 05:31
3

Use zip() to create tuples

df = pd.DataFrame({'a':[1,3,5,6,7], 'b':[3,6,7,4,8]})
print(list(zip(df['a'], df['b']))
ksai
  • 987
  • 6
  • 18