0

DataFrame

I have 2 questions:

  1. I have a dataset that contains some duplicate IDs, but some of them have different actions so they can't be removed. I want for each ID to do some math and store the final value to work with later. I already have duplicate indices, but in this code, it doesn't work properly and gives NaN.

  2. How can I write nested loop using pandas? Cause it takes too much time to run. I've already used iterrows(), but didn't work.

       l_list = []
     for i in range(len(idx)):
         for j in range(len(idx[i])):
             if df.at[j,'action'] == 0:
                 a = df.rank[idx[i]]*50
                 b = df.study_list[idx[i]].str.strip('[]').str.split(',').str.len()
                 l_list.append(a + b)
    
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
Elahe
  • 5
  • 3
  • Please post an example from your input dataframe and the expected output. – navneethc Jun 13 '21 at 14:43
  • @navneethc I made an example and added an image. For example for ID = aaa, if its action is 0, I want its rank * 50 + the number of items in the study_list, which is 2. Then for other IDs = aaa with action = 0, doing the same and finaly have a value for this ID to work with later. I want to do this for all the IDs and have their assigned value. – Elahe Jun 13 '21 at 14:59
  • In the future, please use the recommendations given in https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples to post questions about Pandas. – navneethc Jun 13 '21 at 15:25
  • @navneethc thanks a lot. I'm sorry about that, I'm new to this community and didn't know the rules exactly. Thank you so much for helping. – Elahe Jun 13 '21 at 15:36

2 Answers2

0

i dont know what does the variable idx or anything. i think your code is wrong, you have to try this code

l_list = []
for i in range(len(idx)):
 for j in range(len(idx[i])):
     if df.at[j,'action'] == 0:
         a = df.rank[idx[i]]*50
         b = df.study_list[idx[i]].str.strip('[]').str.split(',').str.len()
         l_list.append(a + b)
0

Based on my understanding of what you've provided, see if this works:

In [15]: df
Out[15]:
    ID  rank  action    study_list
0  aaa    24       0        [a, b]
1  bbb     6       1     [1, 2, 3]
2  aaa    14       0  [1, 2, 3, 4]

In [16]: def do_thing(row):
    ...:     if row['ID'] == 'aaa' and row['action'] == 0:
    ...:         return row['rank'] * 50 + len(row['study_list'])
    ...:     else:
    ...:         return 100 * row['rank']
    ...:

In [17]: df['new_value'] = df.apply(do_thing, axis=1)

In [18]: df
Out[18]:
    ID  rank  action    study_list  new_value
0  aaa    24       0        [a, b]       1202
1  bbb     6       1     [1, 2, 3]        600
2  aaa    14       0  [1, 2, 3, 4]        704

NOTE: I have made many simplifications as your post doesn't enable a reproducible case. Read this thread to see how to best ask questions about Pandas. I also can't guarantee speed as you have not provided the details regarding the size of the dataset.

navneethc
  • 1,234
  • 8
  • 17