0

For a value in a pandas dataframe column, how do I find all other instances of that value in the same column?

Peter's answer got me on track. This gets me close to what I need. It iterates through values in a column and for each value gets the index numbers where it finds the same value in the same column. Please let me knw if there is a better way to go about this.

for i in range(0, len(df)):
    temp_a=df[df['a'] == df.iloc[i]['a']].index.tolist()
AngusE
  • 13
  • 1
  • 6
  • Just to clarify, for each value that is already in the dataframe column, how to I find every row in the dataframe where that value is found in that same column? – AngusE Nov 05 '17 at 18:01
  • There are too many ways to answer this, thus too broad. You should commit some time to creating a minimal and complete example that removes any ambiguity in what you want. – piRSquared Nov 05 '17 at 20:04
  • Actually @piRSquared, the question is straightforward. For a value that could be found in a column, how do I find all other instances of that value in the same column. I guess I don't understand what is ambiguous about that question. Peter's answer got me on track and I came up with a solution, thanks Peter. – AngusE Nov 09 '17 at 13:35
  • it seemed there was some back and forth before peter understood what you wanted. You even said “I think we keep missing the important point”. We didn’t have to keep missing the point if you provided what we call an mcve. Minimal and complete verifiable example. It includes sample data and expected output. In conjunction with your question, that would have gone far to make your problem clear. I’m glad you got the help you needed. But I offer my advice in order to maximize the probability of getting a good answer. You don’t have to take my advice. – piRSquared Nov 09 '17 at 14:29
  • Thanks @piRSquared, but I think you might be misunderstanding. From what I see in the Help Center about asking questions, the reference to minimal, complete, and verifiable is with respect to a problem caused by my code. I didn't have code. I had not yet written code. I wasn't sure how to approach the problem, thus, I asked a question. Now that there is code, is there a better way to do this? – AngusE Nov 09 '17 at 18:39
  • 1
    Yes! Have data, then attempt to write code, show why your attempt doesn't satisfy, then show what you expected to get. With what you've explained, you asked before you tried. I personally don't mind and I'm not trying to come across as judgemental, but I will warn you that there are some among us who take offense that you ask a question with expectations of someone doing the work for you. One of the reasons we want to see an attempt is to validate that you are actually trying rather that thinking of SO as a code writing service. So, yes, there is a better way. – piRSquared Nov 09 '17 at 18:44
  • I don't understand why you all are putting this on hold. My question follows the rules as specified under What topics can I ask about here? a specific programming problem, or a software algorithm, or software tools commonly used by programmers; and is a practical, answerable problem that is unique to software development – AngusE Nov 09 '17 at 19:31

4 Answers4

2
import pandas as pd 
df = pd.DataFrame(data={'col1': [1, 2, 1], 'col2': [3, 4, 5]})
print(df.loc[df['col1'] == 1])

In this example you want the rows where the value in a column equals 1.

It prints the output as:

   col1  col2
0     1     3
2     1     5

Now, to get the row numbers:

print(df.loc[df['col1'] == 1].index.tolist())

Which will give you:

[0, 2]

A much deeper solution can be found on this: Select rows from a DataFrame based on values in a column in pandas I also consulted https://stackoverflow.com/a/46247791/5986661

Omkar Neogi
  • 675
  • 2
  • 9
  • 30
0

Let's say you want to find 1 in 1st column

a = 1 2 3
    4 5 6
    1 2 2

You can do:

b = a.loc[:,0] #0 is column number
b[b==1] # 1 is the value to be found 
anonymous
  • 390
  • 2
  • 12
  • Using your example, for the value in row 1, column1, I am interested in finding all row numbers where that same value is found in that column. – AngusE Nov 05 '17 at 18:04
  • you can use b[b==1].index.values for the row numbers – anonymous Nov 05 '17 at 18:10
  • I realize that, but you are searching for a specific example. Given a dataframe, I don't necessarily know the value in a column, row. I need a generic way to search for the value in column 1, row 1 and get a result that tells me where else that value is found in the same column. – AngusE Nov 05 '17 at 18:13
  • Do you want to search for a specific column or all columns for the given value? – anonymous Nov 05 '17 at 18:17
  • Yes, that's why I said "same column". – AngusE Nov 05 '17 at 18:22
0
for v in df['col_name'].unique():
    print df[df['col_name'] == v]

For example:

np.random.seed(1)
df = pd.DataFrame({'col1': map(chr, np.random.randint(97, 97+5, 10))})
df
# output:
#   col1
# 0    d
# 1    e
# 2    a
# 3    b
# 4    d
# 5    a
# 6    a
# 7    b
# 8    e
# 9    e

for v in df['col1'].unique():
    print df[df['col1'] == v]
# output:
#   col1
# 0    d
# 4    d
#   col1
# 1    e
# 8    e
# 9    e
#   col1
# 2    a
# 5    a
# 6    a
#   col1
# 3    b
# 7    b
Peter Leimbigler
  • 10,775
  • 1
  • 23
  • 37
  • Thanks for the effort, but I think we keep missing the important point is that I don't have a specific value. All of these examples folks keep posting assume that there is a specific known value that I am searching. I don't know the value yet, it's in the dataframe. In your example, column 1, row 0 has the value 'd'. I need to be able to reference what is in column 1, row 0 as the value to search for. – AngusE Nov 05 '17 at 18:32
  • Ah, gotcha. This might be closer to what you're going for: loop over all the distinct values in a column, and for each value, get all rows containing it: `for v in df['col1'].unique(): print df[df['col1'] == v]`. I'll edit my answer to reflect this. – Peter Leimbigler Nov 05 '17 at 18:37
  • 1
    Peter, that got me on track. I should have known I would need to iterate over the datafram or column to get what I needed. I was just hoping there was a helpful built in Pandas function that I didn't know about. Thanks. – AngusE Nov 09 '17 at 13:38
  • @AngusE, great to hear! Looking at your question again, I feel like there's a way of doing it with `groupby()`, but will leave that as an exercise for the reader :) – Peter Leimbigler Nov 10 '17 at 04:31
  • Thanks again Peter. Groupby would help if I was trying to aggregate data, but I am doing an analysis on how much time between events of a particular type. Pulling the indexes into a list allows me to calculate this. If you have some ideas on an approach or an algorithm, I would appreciate hearing about it. But, please don't provide code as it appears to upset the SO guys above. Although I wasn't asking for code, just looking for ideas on the best approach AND follow the rules. I appreciate your example for getting me on the right track. Thanks! – AngusE Nov 10 '17 at 16:18
0

If you're just interested in the indexes of each value in the column, you could also use the groupby method:

df = pd.DataFrame(['a', 'b', 'a', 'b', 'c', 'c', 'd'], columns=['col1'])
df.groupby('col1').apply(lambda g: g.index.tolist())

col1
a    [0, 2]
b    [1, 3]
c    [4, 5]
d       [6]
dtype: object
Jan Zeiseweis
  • 3,718
  • 2
  • 17
  • 24