Set two columns as tuple index in Pandas

Question

Suppose I have a pandas dataframe with columns 0, 1 and 'Future Connections'. How do I set columns 0 and 1 as one tuple index:

For example this data frame:

0   1        Future Connection
6   840      0.0
4   197      1.0
620 979      0.0

would result to:

0           Future Connection
(6, 840)    0.0
(4, 197)    1.0
(620, 979)  0.0

It's what I want though. But it's what the course I'm studying in Coursera wanted to be returned to get some points. — Nikko, Jan 24 '19 at 14:17
@Nikko, You should write to Coursera explaining that they are not teaching Pandas *as it should be taught*. This is a classic misuse of Pandas. — jpp, Jan 24 '19 at 14:18
@jpp Hahaha. The subject is about NetworkX in Python. The two columns were set of nodes. Then it's passed to ML to predict unlabeled set of 'nodes'. In this case, it was predicting the probability of two employees of having a 'future connection'. Thanks again! — Nikko, Jan 24 '19 at 14:23

score 5 · Answer 1 · answered Jan 24 '19 at 14:04

5

How do I set columns 0 and 1 as one tuple index:

"Tuple index" as a concept does not exist in Pandas. You can have an object dtype index containing tuples, but this isn't recommended. The best option is to use a MultiIndex, which stores underlying values efficiently via NumPy arrays. Indeed, Pandas facilitates this with set_index:

df = df.set_index([0, 1])

print(df)
#          Future Connection
# 0   1                     
# 6   840                0.0
# 4   197                1.0
# 620 979                0.0

print(df.index)
# MultiIndex(levels=[[4, 6, 620], [197, 840, 979]],
#            labels=[[1, 0, 2], [1, 0, 2]],
#            names=[0, 1])

print(df.index.values)
# [(6, 840) (4, 197) (620, 979)]

answered Jan 24 '19 at 14:04

jpp

159,742
34
281
339

1

Thank you for clarifying. Data is just from Coursera. I just followed. – Nikko Jan 24 '19 at 14:14
@Nikko, I'm not surprised, there are a multitude of ways Pandas can be misused. This is just one of them. Pity it's in a course, though. – jpp Jan 24 '19 at 14:17
@jpp, or anyone who cares to pipe in, I'm glad you demonstrated MultiIndex, but I'm curious why using tuples to index is a misuse. What is the drawback? Thanks in advance. – Kaleb Coberly Oct 11 '20 at 02:17
@KalebCoberly, See [this answer](https://stackoverflow.com/a/21020411/9209546) and the first comment to that answer regarding inefficiency of using `object` dytpe. – jpp Oct 11 '20 at 16:25
@jpp, thanks for that. So the issue is that MultiIndex is more efficient, not that using tuples for an index is buggy or anything? It's just not as efficient? – Kaleb Coberly Oct 12 '20 at 16:55
@KalebCoberly, Correct: not buggy, but inefficient and not recommended. – jpp Oct 12 '20 at 16:56

score 4 · Accepted Answer · answered Jan 24 '19 at 14:02

4

Use list comprehension with DataFrame.pop for extract column 0, 1:

print (df.columns)
Index([0, 1, 'Future Connection'], dtype='object')

df.index = [x for x in zip(df.pop(0), df.pop(1))]
print (df)
            Future Connection
(6, 840)                  0.0
(4, 197)                  1.0
(620, 979)                0.0

answered Jan 24 '19 at 14:02

jezrael

822,522
95
1,334
1,252

1

This is such an elegant answer. Once again I have learned from jezrael. – tnknepp Jan 24 '19 at 14:15
Thank you again! You are really one of the best! – Nikko Jan 24 '19 at 14:18

Set two columns as tuple index in Pandas

2 Answers2