How to form pandas dataframe from columns with different length and condition?

Question

I have two columns from different dataframes with different length (60,14), I want to compare each item of 60 with all items of 14 then put the result in another column with related comparing columns. I have the result of comparing in a list of lists [outer_list for col_60[inner_list for result 1,1], [inner_list for result 1,2],....[inner_list for result(60,14]] my problem is how to form a dataframe in this format ? (col_60 = 60 rows, col_14 = 14*60, col_result = 14*60 rows): NOTE: columns' items are lists

col_60     col_14        col_result
              1          result_of(1,1)
              2          result_of(1,2)
              3             ..
  1           4
              ..
              ..            ..
              ..            ..
              13        result_of(1,13)
              14        result_of(1,14)
____________________________________________
             1          result_of(2,1)
             2          result_of(2,2)
             3             ..
  2          4
             ..
             ..            ..
             ..            ..
             13        result_of(2,13)
             14        result_of(2,14)
____________________________________________
            1          result_of(3,1)
            2          result_of(3,2)
            3             ..
  3         4
            ..
            ..            ..
            ..            ..
            13        result_of(3,13)
            14        result_of(3,14)
____________________________________________
              ..
              ..
              ..

I used the accepted answer in this question but it stacks result column without related columns, and this question results NaN

Enrique Ortiz Casillas · Answer 1 · 2019-12-28T19:41:59.403

You can solve this using hierarchical indexing. Here is an example of how it works for the first two pairs of combinations of length 14.

import pandas as pd 



results = ["result(1,1)", "result(1,2)", "result(1,3)", ... "result(2,14)",] 
#put all the results in just one list instead of a list of lists

data = pd.Series(results, index = [['1', '1', '1', '1', '1', '1', #14 ones
                                  '1', '1', '1','1', '1', '1', '1', '1',
'2', '2', '2', '2', '2', '2', '2','2', '2', '2', '2', '2', '2', '2'], #14 two's
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 
 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]])

print(data)



1  1      result(1,1)
   2      result(1,2)
   3      result(1,3)
   4      result(1,4)
   5      result(1,5)
   6      result(1,6)
   7      result(1,7)
   8      result(1,8)
   9      result(1,9)
   10    result(1,10)
   11    result(1,11)
   12    result(1,12)
   13    result(1,13)
   14    result(1,14)
2  1      result(2,1)
   2      result(2,2)
   3      result(2,3)
   4      result(2,4)
   5      result(2,5)
   6      result(2,6)
   7      result(2,7)
   8      result(2,8)
   9      result(2,9)
   10    result(2,10)
   11    result(2,11)
   12    result(2,12)
   13    result(2,13)
   14    result(2,14)
dtype: object
>>>

If you put all your 60*14 results in one list, here is the code to have the other indexes ready:

first_index_raw = [[str(i)]*14 for i in range(1,60)]
first_index_final = [e for e in first_index_raw for e in e]
#a massive list that looks like this: [1,1,...1, 2, 2,....2, 3, 3,    
# 59, 59,...,60,60,...60] every element is repeated 14 times 

second_index = [i for i in range(1,15)]*60
#[1, 2,...14, 1, 2,...14,...1,2,...14] 60 times. 

data = pd.Series(results, index= [first_index_final,second_index])

It's true that you get a series and not a data frame, but I hope it helps!

thanks for reply, but I need to form df and get values without indicating to it directly as in mentioned questions because in fact I am working on huge dataset and I have to work on df columns later. — nucsit026, Dec 29 '19 at 07:07

How to form pandas dataframe from columns with different length and condition?

1 Answers1