1

I have 2 pandas dataframes, df_pe and df_merged. Both the dataframes have several rows, as well as several columns. Now, there are some specific things I would like to accomplish using these dataframes:

  1. In df_merged, there is a column named ST, which contains timestamps of various events in format eg. (2017-08-27 00:00:00). In df_pe, there are 2 columns Ton and Toff which contain the time when an event started and when and event ended. Eg. (Ton value for a random row: 2018-08-17 01:20:00 while Toff value 2018-08-17 02:30:00).
  2. Secondly, there is a column in df_pe, namely EC. I have another dataframe called df_uniqueal, which also has a column called EC. What I would like to do is:

    a. For all rows in df_merged, whenever the ST value is within the duration of Ton and Toff in the df_pe, create 2 new columns in df_merged: EC and ED. Append/Put the value of the EC from data frame df_pe into this new column, while, put the value of the dataframe df_uniqueal into the new column ED (which is eventually a mapped version of the EC in df_pe for obtaining ED in df_uniqueal). If none of the conditions matches/there are NaNs (missing values) left after this procedure, put the string "NF" into df_merged's new ED column, while put the integer 0 into the df_merged's new EC column.

I have explored SO and SE, but have not found anything substantial. Any help in this regard is highly appreciated.

This is my attempt at using for loops in Python for iterating over the dataframes for accomplishing the first condition but it runs forever (never ending) and I don't think this is the best possible way to accomplish this.

for i in range(len(df_merged)):
    for j in range(len(df_pe)):
        if df_pe.TOn[j] < df_merged.ST[i] < df_pe.TOff[j]:
            df_merged.EC[i] = df_pe.EC[j]
            df_merged.ED[i] = df_uniqueal.ED[df_processed.EC[j]]
        else:

            df_merged.EC[i] = 0
            df_merged.ED[i] = "NF"

EDIT Please refer image for expected output and baby example of dataframes.Expected Output and Baby Example

The relevant columns are in bold (note the column numbers may differ, but the column names are same in this sample example).

Alan Williams
  • 509
  • 1
  • 8
  • 26
JChat
  • 784
  • 2
  • 13
  • 33
  • 3
    This is hard to read and to understand. Include an example dataframe and an example expected output. We read data better than big blob of text. – Erfan Apr 11 '19 at 11:05
  • Thanks for the suggestion @Erfan. I have now added the expected output and baby example. Cheers! – JChat Apr 11 '19 at 11:25
  • We cannot copy and paste pictures. Could you reproduce this data so we can copy them? Two things you can do: 1. `print()` them in your console and copy paste the output here or 2. see here how you can make dataframes in `pandas`. https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – Erfan Apr 11 '19 at 11:28
  • I am aware of how to create dataframes @Erfan. But the problem is that I need to work around this baby example which I mentioned here, as I cannot share the exact data.... I hope you understand the problem. I just want the suggestions as to the appropriate functions/methods in Python which can help with this, not the exact code, as I can do that myself please. – JChat Apr 11 '19 at 11:33
  • Do you expect only one row for the values of df_merged['ST'] to be inside df_pe['Ton'] and df_pe['Toff']? – Cr1064 Apr 12 '19 at 15:36
  • @Cr1064 Sorry, no. I want to do this for all values in the dataframes, not just for one row. Thanks – JChat Apr 12 '19 at 15:41
  • @JChat Maybe I should rephrase, for a given row in df_merged['ST'] is there a chance that it lies between multiple rows in df_pe['Ton'] + df_pe['Toff']? – Cr1064 Apr 12 '19 at 15:44
  • @Cr1064 you can think of df_merged as a large dataframe. It is possible that multiple rows of df_merged lie in the time intervals specified by Ton and Toff of df_pe but its also a possibility that none of the rows are in the interval/only a single row lies in the interval. But in that case I need to put NF and 0 as mentioned in the question.. thanks – JChat Apr 12 '19 at 17:46

1 Answers1

0

If I have understood the question correctly hopefully this will get you started.

for i,val in df_merged['ST'].items():
    bool_idx = (df_pe['TOn']<val)&(val<df_pe['Toff'])
    if df_pe[bool_idx]['EC'].empty:
       df_merged.loc[i,'EC']=0
       df_merged.loc[i,'ED']="NF"
    else:
       value_from_df_pe = df_pe[bool_idx]['EC']
       df_merged.loc[i,'EC']= value_from_df_pe
       value_from_df_uniqueal = df_uniqueal[df_uniqueal['EC']==value_from_df_pe]['EC']
       df_merged.loc[i,'ED']= value_from_df_uniqueal

Please note I have note tested this code on any data.

Cr1064
  • 409
  • 5
  • 15