2

I need to try create two loops (must be separate):

LOOP 1) for each fruit:

  1. keep rows if that fruit is True
  2. remove rows with duplicate dates (either row can be deleted)
  3. save the result of the above as a dataframe for each fruit

LOOP 2) for each dataframe created, graph date on fruit_score:

    concat   apple_score  banana_score       date        apple      banana  
1   apple     0.400         0.400        2010-02-12      True        False  
2   banana    0.530         0.300        2010-01-12      False       True   
3   kiwi      0.532          0.200       2010-03-03      False       False  
4   bana      0.634         0.100        2010-03-03      False       True   

I tried:

fruits = ['apple',  'banana',   'orange']
for fruit in fruits:
    selected_rows = df[df[ fruit ] == True ]
    df_f'{fruit}' = selected_rows.drop_duplicates(subset='date')

for fruit in fruits:
    df_f'{fruit}'.plot(x="date", y=(f'{fruit}_score'), kind="line")
vvvvv
  • 25,404
  • 19
  • 49
  • 81
arv
  • 398
  • 1
  • 9
  • Are you trying to programatically define the name of a variable ? you're expecting to get a variable called df_apple for example ? – Youyoun Jul 24 '20 at 09:07
  • You could use a dict instead of getting a variable name based on the for loop: https://stackoverflow.com/a/11553769/1735729 – Stergios Jul 24 '20 at 09:09
  • Not variables but I was hoping to generate 2 dataframes labelled df_apple and df_banana (in this example) – arv Jul 24 '20 at 09:09
  • try `isin` and drop dupes `df[df['concat'].isin(fruits)].drop_duplicates(subset=['date'],keep='first)` – Umar.H Jul 24 '20 at 09:11
  • 1
    Use a dict then, `fruits_df = {}` and in your for loop use `fruits_df[fruit] = ...` – Youyoun Jul 24 '20 at 09:11
  • also don't use for loops in pandas, it should be a last resort when you can't use any other methods. – Umar.H Jul 24 '20 at 09:12
  • 1
    @Manakin i dont think that will work cause he got "bana" in concat but the column banana is set to true. + he wishes to drop duplicated by date between same fruit, the other one will drop duplicated for all fruits that have same date. Hes not looping on dataframe, but on fruits. – Youyoun Jul 24 '20 at 09:13
  • 1
    @Youyoun you can subset on more than one column, just add `fruits` to `.drop_duplicates` nothing complex here, no need to iterate over the list either. – Umar.H Jul 24 '20 at 09:17
  • @Manakin How would you create `df_apple` and `df_banana` without looping over the `fruits` list? – Jack Fleeting Jul 24 '20 at 17:10

1 Answers1

3

You should do something along the lines suggested by @youyoun:

dfs = {}
fruits = ['apple',  'banana']
for fruit in fruits:
    selected_rows = df[df[ fruit ] == True ].drop_duplicates(subset='date')
    dfs[f'df_{fruit}'] = selected_rows

for a,v in dfs.items():
    print(a)
    print(v)

Output:

df_apple
  concat  apple_score  banana_score        date  apple  banana
1  apple          0.4           0.4  2010-02-12   True   False
df_banana
   concat  apple_score  banana_score        date  apple  banana
2  banana        0.530           0.3  2010-01-12  False    True
4    bana        0.634           0.1  2010-03-03  False    True
Jack Fleeting
  • 24,385
  • 6
  • 23
  • 45
  • even simplier you could do `dfs = {fruit, data for fruit,data in df.groupby('fruit').unique()}` or something along those lines. – Umar.H Jul 25 '20 at 13:47