1

I have a sample dataframe df and an array n as shown below. I want to filter based on the array values which are in index. The output dataframe is shown below as well. I have tried Out = df[df.index == n] , Out = df.loc[df.index == n] and df.loc[n] which is not working giving an error Lengths must match to compare. Can anyone help me in solving this. Here the array is the row number corresponding to data frame.

df = 
             Open   High    Low    Close    Adj Close   Volume
2007-06-18  0.33979 0.33979 0.33979 0.33979 0.33979 1591888
2007-06-29  0.33074 0.33074 0.33074 0.33074 0.33074 88440
2007-06-20  0.33526 0.33526 0.33526 0.33526 0.33526 3538
2007-06-21  0.32113 0.32113 0.32113 0.32113 0.32113 3550
2007-06-22  0.34713 0.34713 0.34713 0.34713 0.34713 670
2007-06-16  0.33979 0.33979 0.33979 0.33979 0.33979 1591888
2007-06-30  0.33074 0.33074 0.33074 0.33074 0.33074 88440
2007-06-31  0.33526 0.33526 0.33526 0.33526 0.33526 3538
2007-06-44  0.32113 0.32113 0.32113 0.32113 0.32113 3550
2007-06-22  0.34713 0.34713 0.34713 0.34713 0.34713 670

n = array([0, 1, 2, 3])

Out  = 
            Open      High  Low     Close   Adj Close   Volume
2007-06-18  0.33979 0.33979 0.33979 0.33979 0.33979 1591888
2007-06-29  0.33074 0.33074 0.33074 0.33074 0.33074 88440
2007-06-20  0.33526 0.33526 0.33526 0.33526 0.33526 3538
2007-06-21  0.32113 0.32113 0.32113 0.32113 0.32113 3550
Georgy
  • 12,464
  • 7
  • 65
  • 73
Alex_MN
  • 155
  • 2
  • 9
  • @Ben I did already try using the above statement. But it is giving me an empty data fame. – Alex_MN Jul 17 '18 at 13:59
  • 2
    Possible duplicate of [Select Pandas rows based on list index](https://stackoverflow.com/questions/19155718/select-pandas-rows-based-on-list-index) – Georgy Jul 17 '18 at 14:04

3 Answers3

3

Pandas notation for slicing:

df.iloc[0:4,:]
Yuca
  • 6,010
  • 3
  • 22
  • 42
  • Thanks for your answer. can you please explain me what is going inside this code and where did I make mistake? – Alex_MN Jul 17 '18 at 14:02
  • 1
    Sure, the iloc stands for integer location, basically you are giving the 'position' if your indexes were ordered from 0 to the size of your dataframe. So ;2007-06-18' is at position 0, to recover that row you could do either df.loc['2007-06-18',:] or df.iloc[0,:]. The mistake you made was that you were using loc instead of iloc. Loc requires you to give an index in the same datatype as the dataframe's index, that's why df.loc[n] didn't work – Yuca Jul 17 '18 at 14:40
3

Use DataFrame.iloc for select by positions:

n = np.array([     0,      1,      2, 3])
df = df.iloc[n]
print (df)
               Open     High      Low    Close  Adj Close   Volume
2007-06-18  0.33979  0.33979  0.33979  0.33979    0.33979  1591888
2007-06-29  0.33074  0.33074  0.33074  0.33074    0.33074    88440
2007-06-20  0.33526  0.33526  0.33526  0.33526    0.33526     3538
2007-06-21  0.32113  0.32113  0.32113  0.32113    0.32113     3550
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

Replace everything between the <> with your input

# slice by column position
df.iloc[<start_row>:<end_row>, <column_start_position>:<column_end_position>]
# for everything in a column
df.iloc[:, <column_position>]


# slice by column name
df.loc[<start_row>:<end_row>, <column_name>]
# for everything in a column
df.loc[:, <column_name>]

Review Index and Selecting Data in the pandas docs too. Super informative, if not a bit confusing on the first pass.