0

I have two excel files file1.xlsx and file2.xlsx. The data present in file1 is manually entered data (has only numerical data) and the data present is file2 is processed data. Processing done are removing strings and removing white spaces, to convert into numerical. For an instance, "2.123 456 ab" is converted to "2.123456". Here "ab" is removed and "white spaces" are also removed.

Now, I am iterating over the dataframe using " for index, row in df.iterrows( ):" and based on some "if-conditions", making some changes in the dataframe using row["column2"]=0 in the for-loop.

The changes made in file1 dataframe is happening outside the for-loop but for the file2, the changes made in the for-loop are not getting reflected outside the for-loop.

Code used for removing spaces :

 df["column1"]= df["column1"].apply(lambda x: re.sub(r'(\d)\s+(\d)' , r'\1', x))

or

df["column1"]= df["column1"].apply(lambda x: re.sub(r'(\d)\s+(\d)' , r'\1\2', x))

code used for removing string:

df["column1"] = df["column1"].replace({"ab":""}, regex =True).astype(float)

Is there anything related to strings, because of which updates are not reflected in file2 ?

Can someone please help

Advance thankyou

Code:
import pandas as pd
import re
file = "file2.xlsx"
df = pd.read_excel(file)
df["column2"]=1
df["column1"]   = df["column1"].apply(lambda x: re.sub(r'(\d)\s+(\d)', r'\1', x))
df["column1"] = df["column1"].replace({"ab":""}, regex=True).astype(float)
for index, row in df.iterrows():
    if(row["column1"]==100):
        row["column2"]=0
print(df)
df.to_excel("output.xlsx", index =False)
pjrockzzz
  • 135
  • 1
  • 8
  • What do you mean by "updates are not reflected in file2"? Could you please post your full code, including this for loop? – Iguananaut Mar 25 '21 at 09:25
  • In the iteration, row["coulmn2"] = 0 is the change that I have made and when I print dataframe after for-loop, the value in column2 for that particular row is still "1" and not "0" – pjrockzzz Mar 25 '21 at 09:28
  • Please post your full code (at least the parts that are relevant). – Iguananaut Mar 25 '21 at 09:30
  • 2
    If the problem is with modifying column data during `iterrows` perhaps this can help: [Updating value in iterrow for pandas](https://stackoverflow.com/questions/25478528/updating-value-in-iterrow-for-pandas) – Iguananaut Mar 25 '21 at 09:34
  • @Iguananaut I have added the code – pjrockzzz Mar 25 '21 at 09:52
  • @Iguananaut The link has helped. Thankyou – pjrockzzz Mar 25 '21 at 10:15

0 Answers0