105

Say df is a pandas dataframe.

  • df.loc[] only accepts names
  • df.iloc[] only accepts integers (actual placements)
  • df.ix[] accepts both names and integers:

When referencing rows, df.ix[row_idx, ] only wants to be given names. e.g.

df = pd.DataFrame({'a' : ['one', 'two', 'three','four', 'five', 'six'],
                   '1' : np.arange(6)})
df = df.ix[2:6]
print(df)

   1      a
2  2  three
3  3   four
4  4   five
5  5    six

df.ix[0, 'a']

throws an error, it doesn't give return 'two'.

When referencing columns, iloc is prefers integers, not names. e.g.

df.ix[2, 1]

returns 'three', not 2. (Although df.idx[2, '1'] does return 2).

Oddly, I'd like the exact opposite functionality. Usually my column names are very meaningful, so in my code I reference them directly. But due to a lot of observation cleaning, the row names in my pandas data frames don't usually correspond to range(len(df)).

I realize I can use:

df.iloc[0].loc['a'] # returns three

But it seems ugly! Does anyone know of a better way to do this, so that the code would look like this?

df.foo[0, 'a'] # returns three

In fact, is it possible to add on my own new method to pandas.core.frame.DataFrames, so e.g. df.idx(rows, cols) is in fact df.iloc[rows].loc[cols]?

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
Hillary Sanders
  • 5,778
  • 10
  • 33
  • 50
  • 19
    You could use `df['a'].iloc[0]`. – unutbu Feb 27 '15 at 02:58
  • 15
    See also [GH 9213](https://github.com/pydata/pandas/issues/9213#issuecomment-72076683), which suggests `df.loc[df.index[0], 'a']`. This has the [advantage of not using chained indexing](http://pandas.pydata.org/pandas-docs/stable/indexing.html#why-does-the-assignment-when-using-chained-indexing-fail), which means it will work when making assignments, whereas `df[['a','b']].iloc[0] = val` would not. – unutbu Feb 27 '15 at 22:46
  • 1
    doesn't really solve your problem but very good answer here: https://stackoverflow.com/questions/31593201/pandas-iloc-vs-ix-vs-loc-explanation – JohnE Aug 15 '17 at 14:19
  • 5
    Or the other way around, too: df.iloc[0, df.columns.get_loc("a")] – Landmaster Aug 18 '17 at 00:43

7 Answers7

79

It's a late answer, but @unutbu's comment is still valid and a great solution to this problem.

To index a DataFrame with integer rows and named columns (labeled columns):

df.loc[df.index[#], 'NAME'] where # is a valid integer index and NAME is the name of the column.

brunston
  • 1,244
  • 1
  • 10
  • 18
  • 1
    Seems very slow on long dataframes. – ConanG Nov 08 '17 at 17:06
  • 1
    But it works splendidly. I stumbled on this yesterday and it is the exact syntax I needed to update a copy of a dataframe, linking back to the original by the index and by column name. – horcle_buzz Apr 01 '18 at 15:35
  • 7
    Your method requires values in index are unique. Otherwise it will return a Series with all match index "#" – Yingbo Miao Apr 03 '19 at 13:30
46

The existing answers seem short-sighted to me.

Problematic Solutions

  1. df.loc[df.index[0], 'a']
    The strategy here is to get the row label of the 0th row and then use .loc as normal. I see two issues.

    1. If df has repeated row labels, df.loc[df.index[0], 'a'] could return multiple rows.
    2. .loc is slower than .iloc so you're sacrificing speed here.
  2. df.reset_index(drop=True).loc[0, 'a']
    The strategy here is to reset the index so the row labels become 0, 1, 2, ... thus .loc[0] gives the same result as .iloc[0]. Still, the problem here is runtime, as .loc is slower than .iloc and you'll incur a cost for resetting the index.

Better Solution

I suggest following @Landmaster's comment:

df.iloc[0, df.columns.get_loc("a")]

Essentially, this is the same as df.iloc[0, 0] except we get the column index dynamically using df.columns.get_loc("a").

To index multiple columns such as ['a', 'b', 'c'], use:

df.iloc[0, [df.columns.get_loc(c) for c in ['a', 'b', 'c']]]

Update

This is discussed here as part of my course on Pandas.

Ben
  • 20,038
  • 30
  • 112
  • 189
  • 3
    Your preferred solution `df.iloc[0, df.columns.get_loc("a")]` isn't exempt from duplicate labels as column labels can be dublicated too. So you gain nothing but it's more verbose and slower than `df.loc[df.index[0], 'a']`. For single value access you should use neither of them anyway. – Darkonaut Jan 23 '20 at 00:37
  • @Darkonaut duplicated column names are much *much* less likely to occur than duplicated row labels. Also, `df.iloc[0, df.columns.get_loc("a")]` and `df.loc[df.index[0], 'a']` should be nearly identical in their runtime unless df has thousands of columns, but even then the difference should be marginal. – Ben Jan 23 '20 at 04:02
14

A very late answer but it amazed me that pandas still doesn't have such a function after all these years. If it irks you a lot, you can monkey-patch a custom indexer into the DataFrame:

class XLocIndexer:
    def __init__(self, frame):
        self.frame = frame
    
    def __getitem__(self, key):
        row, col = key
        return self.frame.iloc[row][col]

pd.core.indexing.IndexingMixin.xloc = property(lambda frame: XLocIndexer(frame))

# Usage
df.xloc[0, 'a'] # one
Code Different
  • 90,614
  • 16
  • 144
  • 163
9

For getting or setting a single value in a DataFrame by row/column labels, you better use DataFrame.at instead of DataFrame.loc, as it is ...

  1. faster
  2. you are more explicit about wanting to access only a single value.

How others have already shown, if you start out with an integer position for the row, you still have to find the row-label first with DataFrame.index as DataFrame.at only accepts labels:

df.at[df.index[0], 'a']
# Out: 'three'

Benchmark:

%timeit df.at[df.index[0], 'a']
# 7.57 µs ± 30.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit df.loc[df.index[0], 'a']
# 10.9 µs ± 53.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit df.iloc[0, df.columns.get_loc("a")]
# 13.3 µs ± 24 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

For completeness:

DataFrame.iat for accessing a single value for a row/column pair by integer position.

Darkonaut
  • 20,186
  • 7
  • 54
  • 65
  • How big are the DataFrames? For indexes that aren't just ordered integers, I assume `df.index` would need to do a reverse lookup and that would likely require `O(n)` iteration over the `n` rows. How would it deal with duplicates? Wouldn't `iat` be the fastest of all the solutions and also `O(1)`? – Mateen Ulhaq Mar 01 '21 at 04:52
  • @MateenUlhaq Must have been the same `df` OP gave as example. `df.index` is hashed, so `O(1)`. Duplicates won't be ignored, so always ensure you filtered for duplicates before. I don't recall timings for `iat`, but in general positional lookup just isn't always an option. – Darkonaut Mar 01 '21 at 10:20
6

we can reset the index and then use 0 based indexing like this

df.reset_index(drop=True).loc[0,'a']

edit: removed [] from col name index 'a' so it just outputs the value

Krishna
  • 415
  • 1
  • 4
  • 11
  • That would not return a valid result, because there is no '0' in the index. – Hillary Sanders Sep 25 '18 at 16:29
  • understand the question now, thank you! please see if the edited code seems clean enough... – Krishna Sep 26 '18 at 03:53
  • 1
    @KrishnaBandhakavi , However, it will return more exactly if you remove `[]` from `'a'`. => `df.reset_index().loc[0,'a']` – ipramusinto Sep 26 '18 at 06:09
  • This is the only answer that works for making assignments in the case of non-unique indices. Although, in that case you'll want to keep the original index around and put it back afterwards. – user2561747 Jul 19 '19 at 01:14
0

If you need just one row, you can turn rows into columns:

df.transpose()['a']

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.transpose.html

Enrique Pérez Herrero
  • 3,699
  • 2
  • 32
  • 33
-2

Something like df["a"][0] is working fine for me. You may try it out!

  • 1
    It will be a better answer if you explain why this work for you and why it will work for author – flppv Mar 24 '19 at 14:27