0

In many instances when writing loops, I've found myself needing to subset a Pandas dataframe by both the row index and column name at the same time but in pandas, I'm only aware of subsetting using .iloc or .loc separately. My background is in R, which is maybe why this keeps coming up for me.

To give an example, suppose I have the following dataframe:

import pandas as pd

df = pd.DataFrame({'Place':['Tiverton, RI','Newport, RI','Boston, MA','Hartford, CT','Bridgeport, CT','Providence, RI'], 'Year': [2019,2007,2019,2018,2000,2003]})

For every row that has ", RI" in the Place columns, I want to add a "yes" string in a new column titled "RI". I can do this using .iloc in the following way:

import numpy as np

df['RI'] = np.nan

for i in range(0, len(df)):
    if re.search(r'\, RI', df.iloc[i,0]):
        df.iloc[i,2] = 'yes'

However, this doesn't seem like the best practice to me since as my code or data changes, that column index may change and then I'm working on the wrong column. In r, I would loop through with df[i,'Place'] and replace on df[i,'RI']. Is there any similar functionality in python/pandas? Should I be writing these loops in a different way that avoids this situation all together?

Thanks for any clarification you can provide.

Jaycee
  • 35
  • 6

0 Answers0