Convert pandas dataframe elements to tuple

Question

I have a dataframe:

>>> df = pd.DataFrame(np.random.random((3,3)))
>>> df
          0         1         2
0  0.732993  0.611314  0.485260
1  0.935140  0.153149  0.065653
2  0.392037  0.797568  0.662104

What is the easiest way for me convert each entry to a 2-tuple, with first element from the current dataframe, and 2nd element from the last columns ('2')?

i.e. I want the final results to be:

                      0                    1                      2
0  (0.732993, 0.485260)  (0.611314, 0.485260)  (0.485260, 0.485260)
1  (0.935140, 0.065653)  (0.153149, 0.065653)  (0.065653, 0.065653)
2  (0.392037, 0.662104)  (0.797568, 0.662104)  (0.662104, 0.662104)

why do you want to do this? more specifically, why are you using pandas, if you want to keep the data in a format pandas doesn't natively support? you are better off leaving the data in the current format, and changing your algorithm to process data explicitly from the second column — Corley Brigman, Jul 24 '17 at 15:13
For example, I want to do rolling regression using the last column against all other columns. AFAIK, It is not easily achievable ([here](https://stackoverflow.com/questions/44380068/pandas-rolling-regression-alternatives-to-looping), [here](https://stackoverflow.com/questions/21040766/python-pandas-rolling-apply-two-column-input-into-function), [here](https://stackoverflow.com/questions/19121854/using-rolling-apply-on-a-dataframe-object), and [here](https://stackoverflow.com/questions/21025821/python-custom-function-using-rolling-apply-for-pandas)). By converting to tuples I have a shot at it. — Zhang18, Jul 25 '17 at 15:03

cs95 · Accepted Answer · 2017-07-24T15:38:14.333

As of pd version 0.20, you can use df.transform:

In [111]: df
Out[111]: 
   0  1  2
0  1  3  4
1  2  4  5
2  3  5  6

In [112]: df.transform(lambda x: list(zip(x, df[2])))
Out[112]: 
        0       1       2
0  (1, 4)  (3, 4)  (4, 4)
1  (2, 5)  (4, 5)  (5, 5)
2  (3, 6)  (5, 6)  (6, 6)

Or, another solution using df.apply:

In [113]: df.apply(lambda x: list(zip(x, df[2])))
Out[113]: 
        0       1       2
0  (1, 4)  (3, 4)  (4, 4)
1  (2, 5)  (4, 5)  (5, 5)
2  (3, 6)  (5, 6)  (6, 6)

You can also use dict comprehension:

In [126]: pd.DataFrame({i : df[[i, 2]].apply(tuple, axis=1) for i in df.columns})
Out[126]: 
        0       1       2
0  (1, 4)  (3, 4)  (4, 4)
1  (2, 5)  (4, 5)  (5, 5)
2  (3, 6)  (5, 6)  (6, 6)

score 0 · Answer 2 · answered Jul 24 '17 at 15:28

I agree with Corley's comment that you are better off leaving the data in the current format, and changing your algorithm to process data explicitly from the second column.

However, to answer your question, you can define a function that does what's desired and call it using apply.

I don't like this answer, it is ugly and "apply" is syntatic sugar for a "For Loop", you are definitely better off not using this:

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.random((3,3)))

df
          0         1         2
0  0.847380  0.897275  0.462872
1  0.161202  0.852504  0.951304
2  0.093574  0.503927  0.986476

def make_tuple(row):
    n= len(row)
    row = [(x,row[n - 1]) for x in row]
    return row

df.apply(make_tuple, axis =1)

0   (0.847379908309, 0.462871875315)  (0.897274903359, 0.462871875315)   
1   (0.161202442072, 0.951303842798)  (0.852504052133, 0.951303842798)   
2  (0.0935742441563, 0.986475692614)  (0.503927404884, 0.986475692614)   
                                  2  
0  (0.462871875315, 0.462871875315)  
1  (0.951303842798, 0.951303842798)  
2  (0.986475692614, 0.986475692614)

Convert pandas dataframe elements to tuple

2 Answers2