0

I have my code set-up below:

for index_outer, row_outer in df_outer.iterrows():


# here in actual code implementation df_devices would be different for every df_outer row but using a fixed example for each user


for index_inner, row_inner in df_devices.iterrows():
    
    device_id = row_inner['devices']
    found = check_dates_containment(row_inner['start_date'], row_inner['end_date'], row_outer['start_date'], row_outer['end_date'])
    
    if found == True:
        print("All user data can be grabbed just from device_id: " + device_id, "\n")
        print(device_id)
        data = df_backend.loc[df_backend['devices']==row_inner['devices'], 'data'].iloc[0]
        print("Single device data is: ", data)
        break
    else:
        print("Go to next inner loop iteration")
        
    if index_inner == len(df_devices)-1:
        print("Must grab data from all device_ids \n")
        data_list = []
        for index_inner, row_inner in df_devices.iterrows():
            
            device_id = row_inner['devices']
            data = df_backend.loc[df_backend['devices']==row_inner['devices'], 'data'].iloc[0]
            data_list.append(data)
        
        final_data = data_list # this is what will be returned
        print("All device data is: ", final_data)

Here are examples of the dataframes to use:

df_outer = pd.DataFrame({"user": ['sally', 'mark', 'carmen'], "start_date": [datetime(2021,10,17,1,0,0), datetime(2021,10,14,1,0,0), datetime(2021,10,22,1,0,0)], "end_date": [datetime(2021,10,19,1,0,0), datetime(2021,10,22,1,0,0), datetime(2021,10,25,1,0,0)]})

df_devices = pd.DataFrame({"devices": ['a', 'b', 'c'], "start_date": [datetime(2021,10,20,1,0,0), datetime(2021,10,19,1,0,0), datetime(2021,10,16,1,0,0)], "end_date": [datetime(2021,10,24,1,0,0), datetime(2021,10,25,1,0,0), datetime(2021,10,28,1,0,0)]})

df_backend = pd.DataFrame({"devices": ['a', 'b', 'c'], "data": [[1, 25, 6, 8, 56, 8], [4, 565, 75, 76, 34, 46], [45, 65, 75, 324, 75, 23]]})

Below is the function called used:

def check_dates_containment(start_date_inner: datetime, end_date_inner: datetime, start_date_outer: datetime, end_date_outer: datetime) -> bool:
    
    if start_date_inner < start_date_outer and end_date_inner > end_date_outer:
        print("Outer dates are contained in inner dates \n")
        return True
    else:
        print("Outer dates are not contained in inner dates \n")
        return False

Where df_devices has a list of all devices, if a certain criteria is i.e found is True where dates in df_outer are contained in one of the dates in df_devices it takes data from just that single device. If the inner loop finishes and the criteria is not met then I must reloop through the whole inner data frame to extract data from all the devices.

Important Note: that in the real implementation df_devices would be unique for every user, I'm just using a fixed example for each but in practice df_devices would be updated just after the first for loop for each user.

Another thing to note is that the data= from df_data - would in reality be a function call to my cloud services to extract the needed data. df_data just emulates this.

The implementation I have above works fine - however I am not too fond of it and feel that it's a bit messy and/or inefficient.

Also from a performance point of view df_outer in reality is a huge dataframe with 1000s of expected rows so the quicker the execution the better.

I was wondering if there is a cleaner way to implement something like this?

Ossz
  • 314
  • 1
  • 10
  • 2
    Can you provide a [concrete, copy-paste-able sample input and expected output?](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – ddejohn Feb 18 '22 at 00:07
  • 2
    Using loops on dataframes is usually a pretty big red flag. Providing a concrete example of what you have, and what you want to do often provides better information than a block of code that, while familiar to you, is completely unfamiliar to us. – ddejohn Feb 18 '22 at 00:09
  • 1
    show what function calls like `get_data` return – gold_cy Feb 18 '22 at 00:21
  • @ddejohn I have updated the above so that it's reproduceable as suggested – Ossz Feb 18 '22 at 02:17
  • @gold_cy I have updated the above so that it's reproduceable and has function calls as suggested – Ossz Feb 18 '22 at 02:19

0 Answers0