0

Problem

Given this dataframe:

df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
   0  1  2
0  1  2  3
1  4  5  6
2  7  8  9

what is the simplest route to this:

        0       1       2
0  (1, 0)  (2, 0)  (3, 0)
1  (4, 0)  (5, 0)  (6, 0)
2  (7, 0)  (8, 0)  (9, 0)

Considered Questions and Approaches

Is there a way to convert the existing dataframe to a dataframe of tuples?

I haven't found a way to do so, nor thought of a better alternative, so my current approach is to create a new df, replacing each entry with a tuple (entry, flag).

In order to do that I would like to copy or add the original df to a df with empty tuples (0, 0), to avoid manually iterating over and reformating each entry into the new df.

Note, I would like to add the flag to each entry, not each row, making this question different from Adding binary flag to pandas DataFrame.

young_souvlaki
  • 1,886
  • 4
  • 24
  • 28
  • 3
    `df.applymap(lambda x: (x, 0))` – d.b Oct 27 '21 at 18:57
  • 1
    Do you need the tuples? You could get the same information with a MutliIndex on the columns, which then won't lead to major performance issues for subsequent operations. Even something as trivial as counting the non-zero flags is slow/problematic once you have tuples. – ALollz Oct 27 '21 at 19:58
  • What do you want to do with the flag? Better use a second dataframe of booleans as mask. – mozway Oct 27 '21 at 20:05
  • @mozway It needs to be a single file. – young_souvlaki Oct 27 '21 at 20:16
  • @ALollz, that's an interesting idea. If you could make an answer with an example of how to use the data stored in the MultiIndex column I'll consider it. Edit: Oh, but then in order to access the data, I'd need to know the flag value. Or I guess I could use `df.columns[0]`. – young_souvlaki Oct 27 '21 at 20:17

1 Answers1

1

Update

Actually, it seems we need to make it a list since tuples are immutable

Simple use applymap (cell by cell):

>>> df.applymap(lambda x: [x, 0])
        0       1       2
0  [1, 0]  [2, 0]  [3, 0]
1  [4, 0]  [5, 0]  [6, 0]
2  [7, 0]  [8, 0]  [9, 0]

Or apply with a comprehension:

>>> df.apply(lambda x: [[i, 0] for i in x])
        0       1       2
0  [1, 0]  [2, 0]  [3, 0]
1  [4, 0]  [5, 0]  [6, 0]
2  [7, 0]  [8, 0]  [9, 0]
Corralien
  • 109,409
  • 8
  • 28
  • 52