1

Coming from R and finding the index rules for pandas dataframes to be not easy to use. I have a dataframe where I want to get the ith row and some columns by their names. I can clearly understand using either iloc or loc as shown below.

df = pd.DataFrame(np.random.randn(8, 4),columns=['A', 'B', 'C', 'D'])
df.loc[:,['A', 'B']]
df.iloc[0:,0:2]

Conceptually what I want is something like:

df.loc[0:,['A', 'B']]

Meaning the first row with those columns. Of course that code fails. I can seemingly use:

df.loc[0:0,['A', 'B']]

But, this seems strange, though it works. How does one properly index using a combination of row number and column names? In R we would do something like:

df = data.frame(matrix(rnorm(32),8,4))
colnames(df) <- c("A", "B", "C", "D") 
df[1, c('A', 'B')]

*** UPDATE *** I was mistaken, the example code above indeed works on this toy dataframe. But, on my real data, I see the following? Both objects are of same type and code is the same, not understanding the error here.

type(poly_set)
<class 'pandas.core.frame.DataFrame'>
poly_set.loc[:,['P1', 'P2', 'P3']]
                      P1            P2           P3
29   -2.0897226679999998  -1.237649556         None
361  -2.0789117340000001   0.144751427  1.572417454
642  -2.0681314259999999  -0.196563749  1.500834574

poly_set.loc[0,['P1', 'P2', 'P3']]
Traceback (most recent call last):
  File "C:\Users\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\indexes\base.py", line 2646, in get_loc
    return self._engine.get_loc(key)
  File "pandas\_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 998, in pandas._libs.hashtable.Int64HashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1005, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0
user350540
  • 429
  • 5
  • 17
  • 1
    If you want only first row you just need to remove semicolon in your code like this`df.loc[0, ['A', 'B']]` – Iron Hand Odin Jul 06 '20 at 09:45
  • 1
    All of your code seems to work fine. I notice with this `df.loc[0:,['A', 'B']]` you say you want to access only the first row ? If so you should leave ':' out of the call, as this code refers to a slice, i.e. from row 0 to the end of rows. There're a few ways to index by both rows and columns, but all of your code works fine. – dm2 Jul 06 '20 at 09:45
  • Hi, I tried the example df.loc[0: , ['A', 'B'] ] and it worked (could you provide the version of your pandas so that one can know why it failed in your case) . In fact this example would return all rows because you instructed it to start from index 0 till the end. If you try out it this way df.loc[:0,['A', 'B'] ] or df.loc[ 0 , ['A', 'B'] ] it will return only the first row. This is just about slicing as in df.loc[start_row:end_row , ['A', 'B'] ] . Best regards – smile Jul 06 '20 at 09:51
  • Updated my question above, I was mistaken. The code in the example does work, but in the real case it doesn't? – user350540 Jul 06 '20 at 09:55
  • 1
    Your data doesn't start with 0 index that's why you are getting the error. reset the index it will work. – Kriti Pawar Jul 06 '20 at 09:57
  • .loc works with row labels, while .iloc works with positional index. Your data doesn't have a row with label 0, so, as suggested by @KritiPawar, reset index, or use .iloc (note columns will have to be refered to by their positional indices as well (df.iloc[0,[0,1]])) – dm2 Jul 06 '20 at 10:09

3 Answers3

2

You can use .iloc (to get the i-th row) and .loc (to get columns by name) together:

row_number = 0
df.iloc[row_number].loc[['A', 'B']]

You can even remove the .loc:

df.iloc[row_number][['A', 'B']]
Julio Batista Silva
  • 1,861
  • 19
  • 19
0

You are using slicing which means between two given index. If you only want first row data just use:

Try:

df = df.reset_index()    
df.loc[0,['A', 'B']]
Kriti Pawar
  • 832
  • 7
  • 15
0

I agree that pandas slicing rules are not as easy to use as they should be. I believe the suggested approach these days is to use loc[] with a nested index lookup

df.loc[df.index[row_numbers], ['A','B']]

I have no idea why pandas still does not have an xloc[] or something similar that allows for row numbers and column names. See this answer to the same question.

In your answer update, you use loc[], which can only look up row and column indexes, but you can see from the previous printout that there is no row with an index of 0. The row that is in location 0 has an index of 29. If you use my approach or the others mentioned here, you will have success.

farnsy
  • 2,282
  • 19
  • 22