1

I want to get index and column name of every cell in the Pandas data frame.

For example, in data frame generated from the code below

df = pd.DataFrame({1 : np.arange(1, 6), 
               2 : np.arange(6, 11),
               3 : np.arange(11, 16),
               4 : np.arange(16, 21),
               5 : np.arange(21, 26)}, 
              index=[1, 2, 3, 4, 5])

I want to access an index/column name combination of value's row index and value's column name such as [1,1] for 1, [2,1] for 2, [3,1] for 3 etc...

Ultimate goal is to update every value in the data frame based on its position within the data frame with df.apply(). Index and column names (equivalent and ordered identifiers in n x n data frame) are needed to pull values from another data frame.

Thanks!

smci
  • 32,567
  • 20
  • 113
  • 146
verkter
  • 758
  • 4
  • 15
  • 29
  • **You totally don't need to do this, so don't do it** (*"Ultimate goal is to update every value in the data frame based on its position within the data frame with df.apply()"*) If you need to *"pull values from another data frame"*, then use the `df.join()` command, that's what it's there for. Learn dataframe idiom, don't just try to brute-force code into something superficially resembling it (which wouldn't be scalable or performant, anyway). The whole point of dataframes is that we almost never pass around wholesale lists of coords, certainly not for the entire df, let alone a large slice. – smci Dec 19 '17 at 01:48

2 Answers2

2

I would suggest using a own function for doing that. You can access each column of the dataframe by using the dict-like notation. In addition to get the desired element by accessing the needed index/row I would use .ix as shown below

import pandas as pd

df = pd.DataFrame({1 : np.arange(1, 6), 
               2 : np.arange(6, 11),
               3 : np.arange(11, 16),
               4 : np.arange(16, 21),
               5 : np.arange(21, 26)}, 
              index=[1, 2, 3, 4, 5])

def get_from_coords(df, x, y):
    return df[x].ix[y]

So for example:

In [2]: get_from_coords(df, 2, 1)
Out[2]: 6

The docs provide detailed information about indexing pandas dataframes.

Update since I missunderstood the question as clarified in the comments:

def look_for_value(df, value):
    l = []
    for row in df.itertuples():
        print(row)
        if value in row[1:]:
            # appending a tuple of the format `(index name, column name)`
            l.append((row[0], df.columns[row.index(value)-1]))
    return l


def look_using_generator(df, value):
    return [(row[0], df.columns[row.index(value)-1]) for row in df.itertuples() if value in row[1:]]

I am iterating through all the rows of the dataframe using .itertuples() which is faster than .iterrows() and looking for the desired entry/value. If the value is found in the row a tuple containing the index and column name is stored to a list which is returned at the end. I provided a kind of step-by-step solution in the first function and a one-liner using a generator in list comprehension.

Edit since OP pointed out he needs to have the column and index names to change the corresponding value:

Let's say we want to find all values 6 and replace them with 66:

for item in look_using_generator(df, 6):
    df[item[0]].ix[item[1]] = 66
albert
  • 8,027
  • 10
  • 48
  • 84
  • I am trying to do the opposite! For value 6 I want to get (2,1). I would like to avoid nested for loops. – verkter Jan 30 '16 at 23:00
  • Sorry, I missunderstood. – albert Jan 30 '16 at 23:01
  • Not a problem, I added some clarification to the question. – verkter Jan 30 '16 at 23:03
  • What about duplicate entries? Do you want to get the first or the last occurrence only or rather a list containing the coordinates of all of them? – albert Jan 30 '16 at 23:06
  • I would prefer a pair of index and column name for every value. Duplicate entries as well. I am populating a matrix of pearson coefficients of similarities where index and columns are ids of users. – verkter Jan 30 '16 at 23:09
  • Thanks for the update! I was looking for something like this to iterate over every single value. I think this is a standard Pandas approach in modifying data frames. indx = df.index.values for i in indx: for j in indx: df.set_value(j, i, updated_value = 10) – verkter Jan 30 '16 at 23:34
  • Why do you want to iterate over every value? This could be rather slow when working with bigger data since pandas' operations are row-optimized. – albert Jan 30 '16 at 23:52
  • is there a better to update every value in the data frame based on it's position within the data frame (index and column)? – verkter Jan 30 '16 at 23:56
  • The dataframe initialized in your question would return `1` for `df[1].ix[1]` So let's say, you want to change this value to `42`. This can be done as: `df[1].ix[1] = 42`. You should not edit values while iterating as stated in the [docs](http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.iterrows.html) (see Note #2). – albert Jan 31 '16 at 00:01
  • Further edit to my post. If this does not solve your problem and since these comments here seem to get kind of chatty messy, you should think about clarifying what _exactly_ you want to achieve and _why_. – albert Jan 31 '16 at 00:12
0

Use df.columns[column position] to get column label. Inversely: use df.columns.get_loc("column label") to get column position.

Similarly for row, df.index[row position] to get the row index. Inversely: use df.index.get_loc('index label') to get row position.

As for your question, it is straightforward to loop over the dataframe by row and column position, then access by .iloc

E.g.:

def lookup(df, value):
    l = []
    for i in range(df.shape[0]):
        for j in range(df.shape[1]):
            if df.iloc[i, j] == value:
                l.append((df.index[i], df.columns[j]))
    return l
THN
  • 3,351
  • 3
  • 26
  • 40