3

Question

Hi peeps, this question is closely related to this question. Instead of getting the name of the Series, now I'd like to get the index of each particular series. I've tried using the x.index but it returns a list of indices instead of the index of that particular cell.

In [14]: df = pd.DataFrame({
    ...:     'X': [1,2,3,4,5],
    ...:     'Y': [3,4,5,6,7],
    ...:     'Z': [5,6,7,8,9]}, index=['a', 'b', 'c', 'd', 'e'])

In [15]: df
Out[15]: 
   X  Y  Z
a  1  3  5
b  2  4  6
c  3  5  7
d  4  6  8
e  5  7  9

In [15]: df.apply(lambda x: (x.name, x.index), axis=1)
Out[15]: 
a    (a, [X, Y, Z])
b    (b, [X, Y, Z])
c    (c, [X, Y, Z])
d    (d, [X, Y, Z])
e    (e, [X, Y, Z])
dtype: object

Desired output

I'd like to achieve as the following format. However, I'm not sure how to access the index of that particular row. If I use x.index, it returns the list of values of the indices.

As you can see in the example, I just want to get the value of each cell to be (index, column), value

   X           Y          Z
a  (a, X), 1  (a, Y), 3  (a, Z), 5
b  (b, X), 2  (b, Y), 4  (b, Z), 6
c  (c, X), 3  (c, Y), 5  (c, Z), 7
d  (d, X), 4  (d, Y), 6  (d, Z), 8
e  (e, X), 5  (e, Y), 7  (e, Z), 9

Trials

I've tried the following but it won't work since the index is hardcoded. I've also checked the index docs but couldn't find any attributes that suits this needs.

In [35]: df.apply(lambda x: (x.name, x.index[0]), axis=1)
Out[35]: 
a    (a, X)
b    (b, X)
c    (c, X)
d    (d, X)
e    (e, X)
dtype: object

In [36]: df.apply(lambda x: (x.name, x.index[1]), axis=1)
Out[36]: 
a    (a, Y)
b    (b, Y)
c    (c, Y)
d    (d, Y)
e    (e, Y)
dtype: object

In [37]:

I'm thinking that it might be possible to iterate through each column and reassigns the values in it. However, is there a way to do this with apply()? Thanks!

Toto Lele
  • 394
  • 2
  • 13
  • 3
    I am a bit confused by your desired output. Why is the first element of each tuple `a`? Shouldn't the first element of each tuple match the index in the same row? – Derek O May 07 '21 at 04:22
  • Ah yes, you are right @Derek, fixed in the revision – Toto Lele May 07 '21 at 05:02

1 Answers1

2

You can directly modify the row Series and return the modified row Series.

def convert(row):
    for col in row.index:
        row[col] = f'({row.name}, {col}), {row[col]}'
    return row

df = df.apply(convert, axis=1)
print(df)

           X          Y          Z
a  (a, X), 1  (a, Y), 3  (a, Z), 5
b  (b, X), 2  (b, Y), 4  (b, Z), 6
c  (c, X), 3  (c, Y), 5  (c, Z), 7
d  (d, X), 4  (d, Y), 6  (d, Z), 8
e  (e, X), 5  (e, Y), 7  (e, Z), 9
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52
  • Thanks for the answer. I'm curious, since there is a loop going on on each column/row, how would this affect the time/space complexity of the algorithm? I'd ask this because I'd be dealing with a data that is not too small and relatively big and performance would be a slight concern. – Toto Lele May 07 '21 at 05:07
  • 3
    @TotoLele Say your pandas have `m` rows and `n` columns. The time complexity would be `O(m*n)`. – Ynjxsjmh May 07 '21 at 05:10