1
for i in range(1,len(df_raw)):
    if df_raw.loc[i-1, 'A']!= 0 & df_raw.loc[i, 'A']== 0 & df_raw.loc[i+1, 'A']== 0:
        df_raw.loc[i,'B'] = df_raw.loc[i+5,'B']

hi all, i m trying to run this above line of code on my data. till the time data is of range 100,000-150,000 rows , i am able to run this code but for bigger size of data it just keeps on running with no output. Can u pls hlp me with better way of writin this code for bigger data sizes.

user2390182
  • 72,016
  • 6
  • 67
  • 89
  • 2
    Please explain the logic you are trying to do here (your code) so it'll be easier for people to solve what you want in a more efficient way. Also, providing a sample dataframe (even with 5 rows) will help understand your columns and your logic. – OmerM25 Jun 28 '21 at 08:14

2 Answers2

1

I think the method you're missing which efficiently performs this kind of logic is shift. Here's my proposal:

df_raw = df_raw.sort_index() # Optional, if index is not sorted
df_raw['A_is_zero'] = df_raw['A'] == 0
df_raw['prev_A_is_zero'] = df_raw['A_is_zero'].shift(1).fillna(True)
df_raw['next_A_is_zero'] = df_raw['A_is_zero'].shift(-1).fillna(False)
B_to_change = df_raw['A_is_zero'] & df_raw['next_A_is_zero'] & ~df_raw['prev_A_is_zero']
df_raw.loc[B_to_change, 'B'] = df_raw['B'].shift(-5).loc[B_to_change]

Since you didn't provide a sample dataframe I didn't test it though, so I can't guarantee it'll work, but I think I provided the main idea to reach the solution. For instance in the four rows before the last, if B_to_change is True, you'll get NaNs in 'B'. One other thing is that you're using .loc with integers, but I didn't know if your index is a range, in which case my first line is useless, or if it's not and you meant to use iloc (see this link about the loc / iloc difference), in which case my first line should be removed because it would not lead to the expected result.


EDIT:

my requirements has some iterative conditional sequential operations, e.g.:

for i in range(1, len(df_raw)):
    if df_raw.loc[i, 'B'] != 0:
        df_raw.loc[i, 'A'] = df_raw.loc[i-1, 'A']

In this case (which you should have specified in your question), you can use forward filling as follows:

B_is_zero = df_raw['B'] == 0
df_raw['new_A'] = None
df_raw.loc[B_is_zero, 'new_A'] = df_raw.loc[B_is_zero, 'A'] 
df_raw['A'] = df_raw['new_A'].fillna(method='ffill')

Once again, you should be careful of how you handle the edge case where 'B' is nonzero on the first row.

TLouf
  • 81
  • 4
  • thanx for the reply... but my requirements has some iterative conditional sequential operations which are not possible using "shift" method. for eg: ` for i in range(1,len(df_raw)):` `if df_raw.loc[i, 'B'] != 0:` `df_raw.loc[i,'A'] = df_raw.loc[i-1,'A'] ` – tausif shams Jun 29 '21 at 10:23
  • @tausifshams note that this can still be vectorized. TLouf's updated `ffill` code is probably simplest, and it will be much faster than looping. – tdy Jul 21 '21 at 08:11
0

It's possible that your code is just taking a long time to run because of the large number of steps it has to take. (more than 150,000). There are a few things I would recommend doing:

  1. See if you need to be running the code for every one of the elements in your array. If not, this will dramatically improve performance.
  2. Check top/task manager/system monitor (depending on operating system) and see if you've run out of ram.
  3. Change out your bitwise and (&) for the more-idiomatic and faster (shortcircuiting) and
  4. Profile your code
  5. Add a progress bar:
    At the command line: pip install tqdm
    In your code
from tqdm import tqdm

for i in tqdm(range(1,len(df_raw))):
    if df_raw.loc[i-1, 'A'] != 0 and df_raw.loc[i, 'A'] == 0 and df_raw.loc[i+1, 'A']== 0:
        df_raw.loc[i,'B'] = df_raw.loc[i+5,'B']
  1. Consider multiprocessing. If you can split the code up into descrete segments, you can parallelize it on a multi-core system. This can be difficult to do correctly, so I would start with the above steps. If you decide to go with this route and need help, edit your question with a more complete code sample.
Ezra
  • 471
  • 3
  • 14
  • thanx a lot for the reply... just a small change from '&' --> 'and' , has improved the speed a lot.... Aslo tqdm helps well in visualising the progress.... – tausif shams Jun 28 '21 at 12:40