1

I need to reorgannise my Data again. I have two Dataframes:df1 and df2 and i want df3

    df1      df2         df3
   1 2 3      1 2 3     a 1 b 4 c 7 
 1 a b c    1 1 2 3     d 1 e 4 f 7 
 2 d e f    2 4 5 6     g 1 h 4 i 7
 3 g h i    3 7 8 9     a 2 b 5 c 8
                        d 2 e 5 f 8
                        g 2 h 5 i 8
                        a 3 b 6 c 9
                        d 3 e 6 f 9
                        g 3 h 6 i 9 

So the Logic of df3 should be like matrix multiplikation. I did try to modify and Answer given to my yesterdays Question which was:

    a,b = df.shape
m = int(a/2)
x = pd.DataFrame(df.iloc[:m,np.r_[m:b]].to_numpy()).T
y = pd.DataFrame(df.iloc[m:,:m].to_numpy())

out = (pd.concat((y,x),axis=1).sort_index(axis=1)
         .set_axis(df.columns,axis=1,inplace=False))
                             

I did implement this code sucessfully, but the modification by myself does not get me the wanted result. Any Recomendation is welcome and appreciated. Please Consider the following Information:

  1. 3x3 is exemplary , so the given Dataframe Shapes may Vary by size and Shape
  2. Both given Dataframes will always have same Shape and Size
  3. Yesterdays Question -> Goodanswer -> Wrong Question my Bad Reorganize elements of Pandas Dataframe by row and column to new dataframe
  4. I know theres a Way to get automatically a result matrix from matrix multiplication in pandas. i already have this implemented and used in my Script.
  5. Using only Numbers in df1 and only letters in df2 is to make it easier to understand what try to do. in words the needed Datastructure could be described as:

I Hope this big Post, could Clarify my Problem. As i am new here i also appreciate critics on the way i ask for Help and improve my Communication on Stack-Overflow.

A small UPDATE: I found this Pandas concatenate alternating columns Thread which is quite close to what i want to do. As I have to work in a couple of minutes i cant give it a try. But for alle people, who want to contribute to my Question, this might make it easier and give some Inspriation.

I will work on this tomorrow and keep you guys updated.

kind Regards and thanks in Advance,

Hans Peter

Hans Peter
  • 99
  • 8

1 Answers1

1

Managed to get this, using itertools:

prod = itertools.product(df2.T.values, df1.values)
pd.DataFrame.from_records(itertools.chain.from_iterable(zip(lets, nums)) for nums, lets in prod)
# output
   0  1  2  3  4  5
0  a  1  b  4  c  7
1  d  1  e  4  f  7
2  g  1  h  4  i  7
3  a  2  b  5  c  8
4  d  2  e  5  f  8
5  g  2  h  5  i  8
6  a  3  b  6  c  9
7  d  3  e  6  f  9
8  g  3  h  6  i  9

Explanation: I take the .values of df1 and the transpose of df2 and use itertools.product(). (I've used tolist() here so that it's clearer; but it actually results in np arrays.) Which produces this:

list(itertools.product(df2.T.values.tolist(), df1.values.tolist()))
[([1, 4, 7], ['a', 'b', 'c']),
 ([1, 4, 7], ['d', 'e', 'f']),
 ([1, 4, 7], ['g', 'h', 'i']),
 ([2, 5, 8], ['a', 'b', 'c']),
 ([2, 5, 8], ['d', 'e', 'f']),
 ([2, 5, 8], ['g', 'h', 'i']),
 ([3, 6, 9], ['a', 'b', 'c']),
 ([3, 6, 9], ['d', 'e', 'f']),
 ([3, 6, 9], ['g', 'h', 'i'])]

I've taken the numbers (df2) first and letters (df1) second so that the order of the product matches what you want. Otherwise the first column would be 3 a's, 3 d's and 3 g's - but you actually was 3 1's, 3 2's and 3 3's.

zip'ing that would give me a list of tuples for each letter and number.

for nums, lets in prod:
    print(list(zip(lets, nums)))  # letters before numbers
# output:
[('a', 1), ('b', 4), ('c', 7)]
[('d', 1), ('e', 4), ('f', 7)]
[('g', 1), ('h', 4), ('i', 7)]
[('a', 2), ('b', 5), ('c', 8)]
...

Then *zip expands out each tuple and itertools.chain.from_iterable() puts them together in a list. Could also have done itertools.chain(*zip(lets, nums)) for ....

Next, pd.DataFrame.from_records() creates a dataframe row with each list; and generator expression creates an iterable of lists which form the df.

aneroid
  • 12,983
  • 3
  • 36
  • 66
  • 1
    Thanks for your Code and the Explanation. It's nice not only copy and paste, but also to understand and improve my knowledge by our Efford. I will edit my Post because both Dataframes might containg letters and numbers but i thaught using letters and numbers might easen up the understanding of what iam trying to do. So again ( i am still improving on how to ask a good question ) i did a small mistake here. – Hans Peter May 01 '21 at 07:51
  • 1
    Yeah, the letters and numbers split is okay to illustrate what you're trying. In my solution, replace that with the variables `left` and `right` or `df1_terms` and `df2_terms` so that it's not related to letters or numbers - just which thing comes on the left or the right pairs in the concatenation. Your question was fine, well asked, detailed and included the code you tried. Only change I would suggest is to put `df1` and `df2` below each other instead of side by side, so that it can be copied and re-created by those attempting to solve. Or add the code to create them. – aneroid May 01 '21 at 11:03
  • Thanks for the additional Explanation and some more insight into the dynamice behind your Solution. Toay I am going to implement your Soultion into my Code and your Answer should be accepted by 22:00 CET. Later on i will edit my Question as you suggested, thanks for the hint. – Hans Peter May 02 '21 at 09:00