3

Suppose I have a pandas dataframe with columns 0, 1 and 'Future Connections'. How do I set columns 0 and 1 as one tuple index:

For example this data frame:

0   1        Future Connection
6   840      0.0
4   197      1.0
620 979      0.0

would result to:

0           Future Connection
(6, 840)    0.0
(4, 197)    1.0
(620, 979)  0.0
Nikko
  • 1,410
  • 1
  • 22
  • 49
  • 1
    Why not use a multi-level index as recommended by jpp? – tnknepp Jan 24 '19 at 14:14
  • It's what I want though. But it's what the course I'm studying in Coursera wanted to be returned to get some points. – Nikko Jan 24 '19 at 14:17
  • @Nikko, You should write to Coursera explaining that they are not teaching Pandas *as it should be taught*. This is a classic misuse of Pandas. – jpp Jan 24 '19 at 14:18
  • @jpp Hahaha. The subject is about NetworkX in Python. The two columns were set of nodes. Then it's passed to ML to predict unlabeled set of 'nodes'. In this case, it was predicting the probability of two employees of having a 'future connection'. Thanks again! – Nikko Jan 24 '19 at 14:23

2 Answers2

5

How do I set columns 0 and 1 as one tuple index:

"Tuple index" as a concept does not exist in Pandas. You can have an object dtype index containing tuples, but this isn't recommended. The best option is to use a MultiIndex, which stores underlying values efficiently via NumPy arrays. Indeed, Pandas facilitates this with set_index:

df = df.set_index([0, 1])

print(df)
#          Future Connection
# 0   1                     
# 6   840                0.0
# 4   197                1.0
# 620 979                0.0

print(df.index)
# MultiIndex(levels=[[4, 6, 620], [197, 840, 979]],
#            labels=[[1, 0, 2], [1, 0, 2]],
#            names=[0, 1])

print(df.index.values)
# [(6, 840) (4, 197) (620, 979)]
jpp
  • 159,742
  • 34
  • 281
  • 339
  • 1
    Thank you for clarifying. Data is just from Coursera. I just followed. – Nikko Jan 24 '19 at 14:14
  • @Nikko, I'm not surprised, there are a multitude of ways Pandas can be misused. This is just one of them. Pity it's in a course, though. – jpp Jan 24 '19 at 14:17
  • @jpp, or anyone who cares to pipe in, I'm glad you demonstrated MultiIndex, but I'm curious why using tuples to index is a misuse. What is the drawback? Thanks in advance. – Kaleb Coberly Oct 11 '20 at 02:17
  • @KalebCoberly, See [this answer](https://stackoverflow.com/a/21020411/9209546) and the first comment to that answer regarding inefficiency of using `object` dytpe. – jpp Oct 11 '20 at 16:25
  • @jpp, thanks for that. So the issue is that MultiIndex is more efficient, not that using tuples for an index is buggy or anything? It's just not as efficient? – Kaleb Coberly Oct 12 '20 at 16:55
  • @KalebCoberly, Correct: not buggy, but inefficient and not recommended. – jpp Oct 12 '20 at 16:56
4

Use list comprehension with DataFrame.pop for extract column 0, 1:

print (df.columns)
Index([0, 1, 'Future Connection'], dtype='object')

df.index = [x for x in zip(df.pop(0), df.pop(1))]
print (df)
            Future Connection
(6, 840)                  0.0
(4, 197)                  1.0
(620, 979)                0.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252