I have two excel files file1.xlsx and file2.xlsx. The data present in file1 is manually entered data (has only numerical data) and the data present is file2 is processed data. Processing done are removing strings and removing white spaces, to convert into numerical. For an instance, "2.123 456 ab" is converted to "2.123456". Here "ab" is removed and "white spaces" are also removed.
Now, I am iterating over the dataframe using " for index, row in df.iterrows( ):" and based on some "if-conditions", making some changes in the dataframe using row["column2"]=0 in the for-loop.
The changes made in file1 dataframe is happening outside the for-loop but for the file2, the changes made in the for-loop are not getting reflected outside the for-loop.
Code used for removing spaces :
df["column1"]= df["column1"].apply(lambda x: re.sub(r'(\d)\s+(\d)' , r'\1', x))
or
df["column1"]= df["column1"].apply(lambda x: re.sub(r'(\d)\s+(\d)' , r'\1\2', x))
code used for removing string:
df["column1"] = df["column1"].replace({"ab":""}, regex =True).astype(float)
Is there anything related to strings, because of which updates are not reflected in file2 ?
Can someone please help
Advance thankyou
Code:
import pandas as pd
import re
file = "file2.xlsx"
df = pd.read_excel(file)
df["column2"]=1
df["column1"] = df["column1"].apply(lambda x: re.sub(r'(\d)\s+(\d)', r'\1', x))
df["column1"] = df["column1"].replace({"ab":""}, regex=True).astype(float)
for index, row in df.iterrows():
if(row["column1"]==100):
row["column2"]=0
print(df)
df.to_excel("output.xlsx", index =False)