How can I get the number of the row and the column in a dataframe that contains a certain value using Pandas? For example, I have the following dataframe:
For example, i need to know the row and column of "Smith" (row 1, column LastName)
How can I get the number of the row and the column in a dataframe that contains a certain value using Pandas? For example, I have the following dataframe:
For example, i need to know the row and column of "Smith" (row 1, column LastName)
Maybe this is a solution or a first step to a solution.
If you filter for the value you are looking for all items which are not the value you want are replaced with NaN
. Now you can drop all columns where all values are NaN
. This leaves a DataFrame with your item and the indices. Then you can ask for index and name.
import numpy as np
import pandas as pd
df = pd.DataFrame({'LastName':['a', 'Smith', 'b'], 'other':[1,2,3]})
value = df[df=='Smith'].dropna(axis=0, how='all').dropna(axis=1, how='all')
print(value.index.values)
print(value.columns.values)
But I think this can be improved.
Here's a one liner that efficiently gets the row and column of a value:
df = pd.DataFrame({"ClientID": [34, 67, 53], "LastName": ["Johnson", "Smith", "Brows"] })
result = next(x[1:] for x in ((v, i, j) for i, row_tup in enumerate(df.itertuples(index=False)) for j, v in zip(df.columns, row_tup)) if x[0] == "Smith")
print(result)
(1, "LastName")
# This is a generator that unpacks the dataframe and gets the value, row number (i) and column name (j) for every value in the dataframe
item_generator = ((v, i, j) for i, row_tup in enumerate(df.itertuples(index=False)) for j, v in zip(df.columns, row_tup))
# This iterates through the generator until it finds a match
# It outputs just the row and column number by leaving off the first item in the tuple
next(x[1:] for x in item_generator if x[0] == "Smith")
Props to this this answer for the second half of the solution
Just to add another possible solution to the bucket. If you really need to your search your whole DataFrame, you may consider using numpy.where
, such as:
import numpy as np
value = 'Smith'
rows, cols = np.where(df.values == value)
where_are_you = [(df.index[row], df.columns[col]) for row, col in zip(rows, cols)]
So, if your DataFrame is like
ClientID First Name LastName
0 34 Mr Smith
1 67 Keanu Reeves
2 53 Master Yoda
3 99 Smith Smith
4 100 Harry Potter
The code output will be:
[(0, 'LastName'), (3, 'First Name'), (3, 'LastName')]
Edit: Just to satisfy everybody's curiosity, here it is a benchmark of all answers
The code is written below. I removed the print
statements to be fair, because they would make codes really slow for bigger dataframes.
val = 0
def setup(n=10):
return pd.DataFrame(np.random.randint(-100, 100, (n, 3)))
def nested_for(df):
index = df.index # Allows to get the row index
columns = df.columns # Allows to get the column name
value_to_be_checked = val
for i in index[df.isin([value_to_be_checked]).any(axis=1)].to_list():
for j, e in enumerate(df.iloc[i]):
if e == value_to_be_checked:
_ = "(row {}, column {})".format(i, columns[j])
def df_twin_dropna(df):
value = df[df == val].dropna(axis=0, how='all').dropna(axis=1, how='all')
return value.index.values, value.columns.values
def numpy_where(df):
rows, cols = np.where(df.values == val)
return [(df.index[row], df.columns[col]) for row, col in zip(rows, cols)]
def one_line_generator(df):
return [x[1:] for x in ((v, i, j) for i, row_tup in enumerate(df.itertuples(index=False))
for j, v in zip(df.columns, row_tup)) if x[0] == "Smith"]
I tried to simplify the code and make it more readable. This is my attempt:
df = pd.DataFrame({'points': [25, 12, 15, 14, 19],
'assists': [5, 7, 7, 9, 12],
'rebounds': [11, 8, 10, 6, 6]})
index = df.index # Allows to get the row index
columns = df.columns # Allows to get the column name
value_to_be_checked = 6
for i in index[df.isin([value_to_be_checked]).any(axis=1)].to_list():
for j, e in enumerate(df.iloc[i]):
if e == value_to_be_checked:
print("(row {}, column {})".format(i, column[j])
You can do this by looping though all the columns and finding the matching rows. This will give you a list of all the cells that matches your criteria:
Method 1(without comprehension):
import pandas as pd
# assume this df and that we are looking for 'abc'
df = pd.DataFrame({
'clientid': [34, 67, 53],
'lastname': ['Johnson', 'Smith', 'Brows']
})
Searchval = 'Smith'
l1 = []
#loop though all the columns
for col in df.columns:
#finding the matching rows
for i in range(len(df[col][df[col].eq(Searchval)].index)):
#appending the output to the list
l1.append((df[col][df[col].eq(Searchval)].index[i], col))
print(l1)
Method 2 (With comprehension):
import pandas as pd
df = pd.DataFrame({
'clientid': [34, 67, 53],
'lastname': ['Johnson', 'Smith', 'Brows']
})
#Value to search
Searchval = 'Smith'
#using list comprehension to find the rows in each column which matches the criteria
#and saving it in a list in case we get multiple matches
l = [(df[col][df[col].eq(Searchval)].index[i], col) for col in df.columns
for i in range(len(df[col][df[col].eq(Searchval)].index))]
print(l)
Thanks for submitting your request. This is something you can find with a Google search. Please make some attempt to find answers before asking a new question.
You can find simple and excellent dataframe examples that include column and row selection here: https://studymachinelearning.com/python-pandas-dataframe/
You can also see the official documentation here: https://pandas.pydata.org/pandas-docs/stable/
Select a column by column name:
df['col']
select a row by index:
df.loc['b']