2

I have two columns from different dataframes with different length (60,14), I want to compare each item of 60 with all items of 14 then put the result in another column with related comparing columns. I have the result of comparing in a list of lists [outer_list for col_60[inner_list for result 1,1], [inner_list for result 1,2],....[inner_list for result(60,14]] my problem is how to form a dataframe in this format ? (col_60 = 60 rows, col_14 = 14*60, col_result = 14*60 rows): NOTE: columns' items are lists

col_60     col_14        col_result
              1          result_of(1,1)
              2          result_of(1,2)
              3             ..
  1           4
              ..
              ..            ..
              ..            ..
              13        result_of(1,13)
              14        result_of(1,14)
____________________________________________
             1          result_of(2,1)
             2          result_of(2,2)
             3             ..
  2          4
             ..
             ..            ..
             ..            ..
             13        result_of(2,13)
             14        result_of(2,14)
____________________________________________
            1          result_of(3,1)
            2          result_of(3,2)
            3             ..
  3         4
            ..
            ..            ..
            ..            ..
            13        result_of(3,13)
            14        result_of(3,14)
____________________________________________
              ..
              ..
              ..

I used the accepted answer in this question but it stacks result column without related columns, and this question results NaN

nucsit026
  • 652
  • 7
  • 16

1 Answers1

0

You can solve this using hierarchical indexing. Here is an example of how it works for the first two pairs of combinations of length 14.

import pandas as pd 



results = ["result(1,1)", "result(1,2)", "result(1,3)", ... "result(2,14)",] 
#put all the results in just one list instead of a list of lists

data = pd.Series(results, index = [['1', '1', '1', '1', '1', '1', #14 ones
                                  '1', '1', '1','1', '1', '1', '1', '1',
'2', '2', '2', '2', '2', '2', '2','2', '2', '2', '2', '2', '2', '2'], #14 two's
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 
 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]])

print(data)



1  1      result(1,1)
   2      result(1,2)
   3      result(1,3)
   4      result(1,4)
   5      result(1,5)
   6      result(1,6)
   7      result(1,7)
   8      result(1,8)
   9      result(1,9)
   10    result(1,10)
   11    result(1,11)
   12    result(1,12)
   13    result(1,13)
   14    result(1,14)
2  1      result(2,1)
   2      result(2,2)
   3      result(2,3)
   4      result(2,4)
   5      result(2,5)
   6      result(2,6)
   7      result(2,7)
   8      result(2,8)
   9      result(2,9)
   10    result(2,10)
   11    result(2,11)
   12    result(2,12)
   13    result(2,13)
   14    result(2,14)
dtype: object
>>> 

If you put all your 60*14 results in one list, here is the code to have the other indexes ready:

first_index_raw = [[str(i)]*14 for i in range(1,60)]
first_index_final = [e for e in first_index_raw for e in e]
#a massive list that looks like this: [1,1,...1, 2, 2,....2, 3, 3,    
# 59, 59,...,60,60,...60] every element is repeated 14 times 

second_index = [i for i in range(1,15)]*60
#[1, 2,...14, 1, 2,...14,...1,2,...14] 60 times. 

data = pd.Series(results, index= [first_index_final,second_index])

It's true that you get a series and not a data frame, but I hope it helps!

  • thanks for reply, but I need to form df and get values without indicating to it directly as in mentioned questions because in fact I am working on huge dataset and I have to work on df columns later. – nucsit026 Dec 29 '19 at 07:07