0

relatively new to python. I am trying to use the df.loc function to index a column in my dataframe. I want to return the rows in the dataframe where this column equals a number of strings.

Using df.loc to index a column where the column is being matched to one value/string works perfectly, as shown below...

import pandas as pd

df_original = pd.read_csv('example.csv')

columnalias = df_original['colname']

dataframe1 = df_original.loc[columnalias == "value"]

This loads each row from df_original where the value in 'colname' is equal to "value" into a new dataframe (dataframe1).

My problem comes about when I need the value in 'colname' to match a large number of values.

For instance, let's say I want to return the rows in df_original where the values in 'colname' are equal to value1, value2, ... value10000.

This...

values = [value 1, value2, ... value10000]

dataframe2 = df_original.loc[columnalias == for x in values]

doesn't work. Nor does

dataframe3 = df_original.loc[columnalias == "value1" or "value2"

or any similar solutions.

Error messages include pointing me towards a.any() or a.all(), but loading the variables into these has provided similar errors. I am at a loss of what to try next (df.loc documentation seems very sparse for some reason), so I've created a stackoverflow account to ask a question. Hopefully this information is adequate enough for someone to help out.

Nick ODell
  • 15,465
  • 3
  • 32
  • 66

0 Answers0