0

I am using Python to clean address data and standardize abbreviations, etc. so that it can be compared against other address data. I finally have 2 dataframes in Pandas. I would like to compare each row in the first df, named df, against a list created from another list of addresses in a df of similar structure, second_df. If the address from df is on the list, then I would like to create a column to note this, maybe a boolean, but best case the string 'found'. I have used isin and it did not work.

For example, suppose my data looks like the sample data below. I would like to compare each row in df['concat'] to the entire list list to see if the address in df['concat'] column appears in the second_df list.

read = pd.read_excel('fullfilepath.xlsx')
second_df = pd.read_excel('anotherfilepath.xlsx')

df = read[['column1','column2', 'concat']]
list = second_df.concat.tolist()
jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
Brandon
  • 11
  • Please repeat [on topic](https://stackoverflow.com/help/on-topic) and [how to ask](https://stackoverflow.com/help/how-to-ask) from the [intro tour](https://stackoverflow.com/tour). “Show me how to solve this coding problem” is not a Stack Overflow issue. We expect you to make an honest attempt, and *then* ask a *specific* question about your algorithm or technique. Stack Overflow is not intended to replace existing documentation and tutorials. – Prune Mar 18 '21 at 00:21
  • You have not provided sample data, as we do not have access to your file space. Please [include a minimal data frame](https://stackoverflow.com/questions/52413246/how-to-provide-a-reproducible-copy-of-your-dataframe-with-to-clipboard) as part of the expected [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) (MRE). We should be able to copy and paste a contiguous block of your code, execute that file, and reproduce your problem along with tracing output for the problem points. This lets us test our ideas against your data and desired output. – Prune Mar 18 '21 at 00:22
  • You list the logic steps you need to carry out, but you haven't attempted to code those steps: doing so is your task before posting. ["Can Someone Help Me?" is not a valid SO question](https://meta.stackoverflow.com/questions/284236/why-is-can-someone-help-me-not-an-actual-question). This suggests a collection of needs that are too broad for Stack Overflow. – Prune Mar 18 '21 at 00:23
  • 1
    Thanks for the 3 separate responses. I'm reevaluating asking this question. I cannot post the actual code that I have used so far as it's sensitive in nature. I'm currently creating sample code. – Brandon Mar 18 '21 at 00:28

1 Answers1

0

EDIT based on tdy comment as my original answer didn't have the value for False option in where statement.

Try sth like this:

df["isFound"] = np.where(df['concat'].isin(second_df["concat"]), "found", "notfound")

Should be exactly what you need

codemaster
  • 31
  • 3
  • `np.where` doesn't let you specify the `True` case without the `False` case -- needs to be something like `..., "found", "not found")` – tdy Mar 18 '21 at 02:38