-1

given an empty dataframe with assigned column names :

colnames = ('ACCT', 'CTAT', 'AAAT', 'ATCG')*3
df = pd.DataFrame(columns=colnames)

I want to loop over dataframes which have the below structure: (giving 2 for demostration)

sample_df = pd.DataFrame()
sample_df['tetran'] = colnames
sample_df['Frequency'] = (423, 512, 25, 123,632,124,614,73,14,75,311,155)
conids = ("cl1_42", "cl1_41", "cl2_31")
rep_conids = [val for val in conids for _ in range(4)]
sample_df['contig_id'] = rep_conids


sample_df_2 = pd.DataFrame()
sample_df_2['tetran'] = colnames
sample_df_2['Frequency'] = (724, 132, 4, 102,423,402,616,734,153,751,31,55)
conids_2 = ("se1_51", "se1_21", "se2_53")
rep_conids_2 = [val for val in conids_2 for _ in range(4)]
sample_df_2['contig_id'] = rep_conids_2

The objective is:

  1. Add each 'Frequency' value from the 'sample_df's to the corresponding 'tetraN' value of the 'df' and add a new column to be the sample_df['contig_id']

There are multiple 'sample_df' dataframes , so this is the idea of the desired output:

index ACCT CTAT AAAT ATCG
cl1_42 423 512 25 123
cl1_41 632 124 614 73
cl2_31 14 75 311 155
se1_51 724 132 4 102
se1_21 423 402 616 734
se2_53 153 751 31 55

I know how to do this in R but I need this to be done in python so I cannot add here what I tried due it is in R.

Thanks for your time :)

Valentin
  • 399
  • 2
  • 10
  • You could [try using loc to get the desired columns](https://stackoverflow.com/questions/44095387/insert-data-from-one-dataframe-into-another-by-index) and check out [How to add a new column to an existing dataframe?](https://stackoverflow.com/q/12555323/16653700) – Alias Cartellano Jun 15 '23 at 21:33

1 Answers1

1

First, concat your dataframes then pivot them:

out = (pd.concat([sample_df, sample_df_2])
         .pivot(index='contig_id', columns='tetran', values='Frequency'))
print(out)

# Output
tetran     AAAT  ACCT  ATCG  CTAT
contig_id                        
cl1_41      614   632    73   124
cl1_42       25   423   123   512
cl2_31      311    14   155    75
se1_21      616   423   734   402
se1_51        4   724   102   132
se2_53       31   153    55   751

If you don't want the data to be sorted, use pivot_table:

out = (pd.concat([sample_df, sample_df_2])
         .pivot_table(index='contig_id', columns='tetran', values='Frequency', sort=False))
print(out)

# Output
tetran     ACCT  CTAT  AAAT  ATCG
contig_id                        
cl1_42      423   512    25   123
cl1_41      632   124   614    73
cl2_31       14    75   311   155
se1_51      724   132     4   102
se1_21      423   402   616   734
se2_53      153   751    31    55

Useful link: How can I pivot a dataframe?

Corralien
  • 109,409
  • 8
  • 28
  • 52
  • Thanks so much, yes it worked, thought it to be a more difficult thing, I don't get why I got downvotes for the question.. anyways , thanks :) – Valentin Jun 16 '23 at 17:08