5

Very new to pandas.

Is there a way to check given a pandas dataframe, if there exists a row with a certain column value. Say I have a column 'Name' and I need to check for a certain name if it exists.

And once I do this, I will need to make a similar query, but with a bunch of values at a time. I read that there is 'isin', but I'm not sure how to use it. So I need to make a query such that I get all the rows which have 'Name' column matching to any of the values in a big array of names.

mac13k
  • 2,423
  • 23
  • 34
AMM
  • 17,130
  • 24
  • 65
  • 77
  • Sorry for the second bit the answer is in http://stackoverflow.com/questions/12096252/use-a-list-of-values-to-select-rows-from-a-pandas-dataframe i thought i just saw somebody post an answer for the first, but that seems to have vanished. – AMM Apr 06 '14 at 14:48
  • I posted an answer, but then I thought that I misread the question, I put back up. Let me know it it is useful. – Akavall Apr 06 '14 at 14:50

2 Answers2

9
import numpy as np
import pandas as pd
df = pd.DataFrame(data = np.arange(8).reshape(4,2), columns=['name', 'value'])

Result:

>>> df
   name  value
0     0      1
1     2      3
2     4      5
3     6      7
>>> any(df.name == 4)
True
>>> any(df.name == 5)
False

Second Part:

my_data = np.arange(8).reshape(4,2)
my_data[0,0] = 4

df = pd.DataFrame(data = my_data, columns=['name', 'value'])

Result:

>>> df.loc[df.name == 4]
   name  value
0     4      1
2     4      5

Update:

my_data = np.arange(8).reshape(4,2)
my_data[0,0] = 4

df = pd.DataFrame(data = my_data, index=['a', 'b', 'c', 'd'], columns=['name', 'value'])

Result:

>>> df.loc[df.name == 4]  # gives relevant rows
   name  value
a     4      1
c     4      5  
>>> df.loc[df.name == 4].index  # give "row names" of relevant rows
Index([u'a', u'c'], dtype=object)
Akavall
  • 82,592
  • 51
  • 207
  • 251
  • Thank you very much. If i need to do the same based on a label index then in that case what should i use? As in say instead of name I have it stored based on label index and the label is the name. – AMM Apr 06 '14 at 14:57
  • You could just use the index, e.g `df[0]`. – Akavall Apr 06 '14 at 15:02
  • like this any(df['abc']) ? – AMM Apr 06 '14 at 15:03
  • I am not sure what `abc` is, is that a columns name? `any(df['abc'])`should work though, as `df['name']` works just like `df.name`. – Akavall Apr 06 '14 at 15:09
  • I mean if 'abc' is a rowname as in if i am using label based index for the rows. So its not a column name as such. – AMM Apr 06 '14 at 15:44
  • @AMM, I updated my answer, maybe I am still missing something, if so can you construct a simple example to clarify what your input is and what output you are looking for. – Akavall Apr 06 '14 at 16:05
  • how do you do the df.name if your column name has space like "site no"? – JCm Mar 05 '15 at 11:59
  • Is the first part redundant now? Or not working in 2.7? I always have trouble in pandas on this issue. – novice Jun 06 '17 at 20:22
1

If you want to extract set of values given a sequence of row labels and column labels, and the lookup method allows for this and returns a numpy array.

Here is my snippet and output:

>>> import pandas as pd
>>> import numpy as np
>>> df = DataFrame(np.random.rand(20,4), columns = ['A','B','C','D'])
>>> df
           A         B         C         D
0   0.121190  0.360813  0.500082  0.817546
1   0.304313  0.773412  0.902835  0.440485
2   0.700338  0.733342  0.196394  0.364041
3   0.385534  0.078589  0.181256  0.440475
4   0.151840  0.956841  0.422713  0.018626
5   0.995875  0.110973  0.149234  0.543029
6   0.274740  0.745955  0.420808  0.020774
7   0.305654  0.580817  0.580476  0.210345
8   0.726075  0.801743  0.562489  0.367190
9   0.567987  0.591544  0.523653  0.133099
10  0.795625  0.163556  0.594703  0.208612
11  0.977728  0.751709  0.976577  0.439014
12  0.967853  0.214956  0.126942  0.293847
13  0.189418  0.019772  0.618112  0.643358
14  0.526221  0.276373  0.947315  0.792088
15  0.714835  0.782455  0.043654  0.966490
16  0.760602  0.487120  0.747248  0.982081
17  0.050449  0.666720  0.835464  0.522671
18  0.382314  0.146728  0.666722  0.573501
19  0.392152  0.195802  0.919299  0.181929

>>> df.lookup([0,2,4,6], ['B', 'C', 'A','D'])
array([ 0.36081287,  0.19639367,  0.15184046,  0.02077381])
>>> df.lookup([0,2,4,6], ['A', 'B', 'C','D'])
array([ 0.12119047,  0.73334194,  0.4227131 ,  0.02077381])
>>>
James Sapam
  • 16,036
  • 12
  • 50
  • 73