0

So I have two different dataframes df1(len 16) and df2(len 55) and I want to bring a value from df2 but only if certain conditions are met, 3 to be exactly. I´ve tried np.where() and pd.merge() but I can´t get the value that I want. My dataframes are:

df1 = pd.DataFrame({'Data1':['X','X','X','X','Y','Y','Y','Y','Z','Z','Z','Z'],'Data2':[0,10,20,30,40,50,60,70,80,90,100,110]})
df2 = pd.DataFrame({'Data1':['X','X','X','X','Y','Y','Y','Y','Z','Z','Z','Z',...],'Data2':[0,11,21,31,41,51,61,71,81,91,101,111,...],
      'Data3':[10,20,30,40,50,60,70,80,90,100,110,...],'Data4':['A','B','C','D','A','B','C','D','A','B','C','D',...])

I want to make a multiple comparison, df1[Data1] == df2[Data1] & (df1[Data2] >= df2[Data2] & df1[Data2] <= df2[Data3]). If row value of Data1 in df1 equals row value of Data1 in df2 and row value of Data 2 in df1 is between row values of Data2 and Data3 in df2 bring me the row value of Data4 in df2. I´ve tried this:

df1[Data3] = np.where((df1[Data1] == df2[Data1]) & (df1[Data2] >= df2[Data2] & df1[Data2] <= df2[Data3]),df2[Data4],"NA")

But I get ValueError: Can only compare identically-labeled Series objects, because of the difference in length of my series.

I want to avoid using foor loops with pandas, but I don´t see other way of doing it, any suggestions?

Enrique93
  • 11
  • 2
  • The data you shared is invalid structure for a dataframe. An important step to a successful question is posting valid, copy/pasteable data. In `df2` the columns have unequal lengths. Your strings need quotes around them, you need to remove the `...` and you need to prepend `pd.DataFrame()` to the beginning. – David Erickson Jul 01 '21 at 17:22
  • how do you know which rows to compare if the dataframes have different lengths? If the the dfs have different lengths, then you have to have a unique key of one or multiple rows to merge on. – David Erickson Jul 01 '21 at 17:27
  • Hello @David Erickson, thanks for your observations, this is my first question here so I don´t know how it works well yet. The lengths of my columns in df2 are equal, I posted at the beggining of my question that the lenght of my df2 was of 55, and I didn´t put quotes because I was just making reference to data, not to strings or numbers. – Enrique93 Jul 01 '21 at 17:33
  • @DavidErickson that´s my question, how can I compare with multiple conditions and bring the value in two different frames with different lengths – Enrique93 Jul 01 '21 at 17:40
  • your second dataframe is not valid. The Data4 column has more values than the data3 column. Please test your test data and run the code to make sure the data is valid. – David Erickson Jul 01 '21 at 17:45
  • fundamentally there is no base to compare to when you have dataframes of different lengths. You should post valid INPUT data, and also post what the example OUTPUT should look like. Please see: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples on how to ask a good pandas quesiton. – David Erickson Jul 01 '21 at 17:47
  • @DavidErickson I have changed values in Data4 column. I can´t post my real values or output because of compliance is non public information. I put also that I get a Value Error: Can only compare identically-labeled Series objects, when I try the line of the code that i posted before – Enrique93 Jul 01 '21 at 17:55
  • the columns still have unequal lengths. The best thing to do is test this first prior to editing you question. For example, this is a valid dataframe: `df2 = pd.DataFrame({'Data1':['X','X','X','X','Y','Y','Y','Y','Z','Z','Z','Z'],'Data2':[0,11,21,31,41,51,61,71,81,91,101,111], 'Data3':[10,20,30,40,50,60,70,80,90,100,110,120],'Data4':['A','B','C','D','A','B','C','D','A','B','C','D',]})`. You should verify that the code works in your jupyter notebook first. – David Erickson Jul 01 '21 at 18:08

0 Answers0