2

I have a dataframe:

>>> df = pd.DataFrame(np.random.random((3,3)))
>>> df
          0         1         2
0  0.732993  0.611314  0.485260
1  0.935140  0.153149  0.065653
2  0.392037  0.797568  0.662104

What is the easiest way for me convert each entry to a 2-tuple, with first element from the current dataframe, and 2nd element from the last columns ('2')?

i.e. I want the final results to be:

                      0                    1                      2
0  (0.732993, 0.485260)  (0.611314, 0.485260)  (0.485260, 0.485260)
1  (0.935140, 0.065653)  (0.153149, 0.065653)  (0.065653, 0.065653)
2  (0.392037, 0.662104)  (0.797568, 0.662104)  (0.662104, 0.662104)
cs95
  • 379,657
  • 97
  • 704
  • 746
Zhang18
  • 4,800
  • 10
  • 50
  • 67
  • why do you want to do this? more specifically, why are you using pandas, if you want to keep the data in a format pandas doesn't natively support? you are better off leaving the data in the current format, and changing your algorithm to process data explicitly from the second column – Corley Brigman Jul 24 '17 at 15:13
  • For example, I want to do rolling regression using the last column against all other columns. AFAIK, It is not easily achievable ([here](https://stackoverflow.com/questions/44380068/pandas-rolling-regression-alternatives-to-looping), [here](https://stackoverflow.com/questions/21040766/python-pandas-rolling-apply-two-column-input-into-function), [here](https://stackoverflow.com/questions/19121854/using-rolling-apply-on-a-dataframe-object), and [here](https://stackoverflow.com/questions/21025821/python-custom-function-using-rolling-apply-for-pandas)). By converting to tuples I have a shot at it. – Zhang18 Jul 25 '17 at 15:03

2 Answers2

3

As of pd version 0.20, you can use df.transform:

In [111]: df
Out[111]: 
   0  1  2
0  1  3  4
1  2  4  5
2  3  5  6

In [112]: df.transform(lambda x: list(zip(x, df[2])))
Out[112]: 
        0       1       2
0  (1, 4)  (3, 4)  (4, 4)
1  (2, 5)  (4, 5)  (5, 5)
2  (3, 6)  (5, 6)  (6, 6)

Or, another solution using df.apply:

In [113]: df.apply(lambda x: list(zip(x, df[2])))
Out[113]: 
        0       1       2
0  (1, 4)  (3, 4)  (4, 4)
1  (2, 5)  (4, 5)  (5, 5)
2  (3, 6)  (5, 6)  (6, 6) 

You can also use dict comprehension:

In [126]: pd.DataFrame({i : df[[i, 2]].apply(tuple, axis=1) for i in df.columns})
Out[126]: 
        0       1       2
0  (1, 4)  (3, 4)  (4, 4)
1  (2, 5)  (4, 5)  (5, 5)
2  (3, 6)  (5, 6)  (6, 6)
cs95
  • 379,657
  • 97
  • 704
  • 746
0

I agree with Corley's comment that you are better off leaving the data in the current format, and changing your algorithm to process data explicitly from the second column.

However, to answer your question, you can define a function that does what's desired and call it using apply.

I don't like this answer, it is ugly and "apply" is syntatic sugar for a "For Loop", you are definitely better off not using this:

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.random((3,3)))


df
          0         1         2
0  0.847380  0.897275  0.462872
1  0.161202  0.852504  0.951304
2  0.093574  0.503927  0.986476


def make_tuple(row):
    n= len(row)
    row = [(x,row[n - 1]) for x in row]
    return row

df.apply(make_tuple, axis =1)


0   (0.847379908309, 0.462871875315)  (0.897274903359, 0.462871875315)   
1   (0.161202442072, 0.951303842798)  (0.852504052133, 0.951303842798)   
2  (0.0935742441563, 0.986475692614)  (0.503927404884, 0.986475692614)   
                                  2  
0  (0.462871875315, 0.462871875315)  
1  (0.951303842798, 0.951303842798)  
2  (0.986475692614, 0.986475692614)  
Rakesh Adhikesavan
  • 11,966
  • 18
  • 51
  • 76