I'm working on python 2.7, pandas ( version 0.18.1 ) data frames. I have to modify a column in the data frame based on several columns in the same data frame.
For that I have written my code as below Sample data is like below
data
is my dataframe
My sample data is like
+---+---+----+----+---+---------+---+----+----+---+----------+
| a | b | c | d | e | f | g | h | i | j | discount |
+---+---+----+----+---+---------+---+----+----+---+----------+
| 0 | | | | | 65497.6 | | | | | 0 |
| 0 | | | | | 73882.8 | | | | | 0 |
| 0 | | | | | 88588 | | 22 | | | 0 |
| 0 | | | | | 106480 | | 20 | 10 | | 0 |
| 0 | | | | | 52500 | | | | | 0 |
| 0 | | 20 | 10 | | 22997.5 | | | | | 0 |
| | | | | | | | | | | 0 |
| 0 | | | 20 | | 0 | | | | | 0 |
| 0 | | | | | 10520 | | | | | 0 |
+---+---+----+----+---+---------+---+----+----+---+----------+
And my code is like below
columns1 = ['a','b','c','d','e']
columns2 = ['f','g','h','i','j']
data['discount'] = 0
for i, row in data.iterrows():
a = 0
b = 0
for col1 in columns1 :
value = row[col1]
if value > 0:
a = value
break;
for col2 in columns2 :
value = row[col2]
if value > 0:
b = value
break;
if( a != 0 and b != 0):
data.loc[i, 'discount'] = abs(a-b)
As I'm doing this way it is taking lot of time and lot of memory on the large dataset. I have 700MB of data, It is taking more than 120GB of RAM to process and approximately after 10 hours process is giving the exception saying Memory Error
according to this https://stackoverflow.com/a/24871316, I should not use like that, Please let me know how can I write this code more efficient.
Please let me know the reason for down voting my question, so that I can learn