How to compare two columns of strings in python?

Question

My CSV file contains 20 columns and I need to take data of only those addresses that are relevant to my study, so I compare the column containing all addresses to a column containing only specific address.

I am getting "key error' saying the index selected_city does not exist:

import csv
import os
import pandas as pd
data_new = pd.read_csv('file1.csv', encoding= "ISO-8859–1")
print(data_new)
for i in rows:
    if str(data.loc['selected_city'] == data.loc['Charge_Point_City'])
print(data.Volume,data.Charge_Point_City)

Hello, welcome to the site! Be careful, the code you posted is not valid python code, the if statement should have a semicolon at the end, and the line should be indented twice. — Benjamin Audren, Oct 08 '18 at 14:41
Please share a few rows of example data so we can see exactly what you are trying to achieve. If you need help with this, see [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). — jpp, Oct 08 '18 at 17:28

score 0 · Answer 1 · answered Oct 08 '18 at 14:41

0

Consider using the builtin function .isin().

For example:

s = pd.Series(['a','b','c', 'b','c','a','b'])

So now s looks like:

Say you only want to keep the rows where s is in a smaller series:

smol = pd.Series(['a','b'])
s[s.isin(smol)]

Output:

For your specific use case, you probably want

data = data[data['selected_city'].isin(data['Charge_Point_City'])]

answered Oct 08 '18 at 14:41

Jake Morris

645
6
18

This should accomplish OP's goal of "compare the column containing all addresses to a column containing only specific address", assuming OP wants to retain only these rows. Obviously they could simply want to mark the rows where this is true, and in that case don't need to apply the filter to their original data source and can simply set `data['indicator'] = data['selected_city'].isin(data['Charge_Point_City'])`. – Jake Morris Oct 08 '18 at 14:45
1

The KeyError they referenced indicated that `'selected_city'` isn't a valid column name in the `data` DataFrame. You have a nice one-line solution for somebody else's problem. – Hans Musgrave Oct 08 '18 at 15:08
Then someone else will be happy to find it. Feel free to add an alternate answer. – Jake Morris Oct 08 '18 at 17:22

How to compare two columns of strings in python?

1 Answers1