3

I am trying to a twisted modification. I have a DataFrame that has 100k row. I have generated strings that I have added them to a new DataFrame.

At the end I have the following:

df[df['Col1'] == value1]:
        Col1           Col2

6200    value1         string1
6201    value1         string2
6202    value1         string3


stringdf:

         Col2
0        goodstring1
1        goodstring2

Idealy stringdf would be same lengh as the subset of df for a perticular value of Col1.

I would like to change the rows in df as far as possible. In this example it would be to change 2 rows.

I would get:

df[df['Col1'] == value1]:
        Col1           Col2

6200    value1         goodstring1
6201    value1         goodstring2
6202    value1         string3

My approach was:

for i in range(0,len(stringdf)):
     df['Col2'][df['Col1'] == value1].iloc[i] = stringdf['Col2'].iloc[i]

but this doesn't passes without affecting the dataframe df.

Any suggestion, explanation or advice ? I would like to have a very fast processing time.

Methods that I also tried are found here How to replace part of dataframe in pandas

Thank you for your help !

hdatas
  • 1,022
  • 2
  • 14
  • 19

1 Answers1

1

Reindex stringdf to the index of your sub dataframe that was filtered and then use update on the original dataframe.

df = pd.DataFrame(
    {'Col1': ['value1'] * 3, 
     'Col2': ['string1', 'string2', 'string3']}, 
    index=[6200, 6201, 6203])

stringdf = pd.DataFrame({'Col2': ['goodstring1', 'goodstring2']})

idx = df[df['Col1'] == 'value1'].index[:len(stringdf)]
df.update(stringdf.set_index(idx))

>>> df
        Col1         Col2
6200  value1  goodstring1
6201  value1  goodstring2
6203  value1      string3
Alexander
  • 105,104
  • 32
  • 201
  • 196