1

I'm not sure whether this can easily be done in Excel but ... I have a file:

ID    Disease    Gene1    Gene2    Gene3    Gene4    Gene5
1     T2DM       ABCD     ABCDE    ABFJ     ASKL     ASLL
2     T2DM       ASKFA    BSDAW    KWAS     JKS 
3     LDL        GJS      ZYRLA

I wish to keep the first two columns and transform the columns Gene1 through Gene5 into rows, like this:

ID   Disease    Gene
1    T2DM       ABCD    
1    T2DM       ABCDE
1    T2DM       ABFJ
1    T2DM       ASKL
1    T2DM       ASLL
2    T2DM       ASKFA
2    T2DM       BSDAW
2    T2DM       KWAS
2    T2DM       JKS 
3    LDL        GJS
3    LDL        ZYRLA

As I have 1000+ IDs, is there an easy way to do this?

pnuts
  • 58,317
  • 11
  • 87
  • 139
Turtle
  • 163
  • 1
  • 10

1 Answers1

1

I think easy. Assuming ID is in A1, please insert a new column between Disease and Gene1 and populate it from C2 down to suit with*:

=A2&B2  

Select ColumnsC:H and apply the 'flattening' process detailed here.

Having created the Table, delete the Column Column and in its place insert two new columns with, in B2:

=LEFT(A2)  

and in C2:

=MID(A2,2,5)  

Select and Copy the middle two columns of the Table and Paste Special..., Values over the top. Delete ColumnA and filter the Value column to select and delete rows blank for that column. Revert Table to a Range if desired, with or without removing formatting, and reinstate column labels.

* PS You will need to adjust for the length of strings that make up the IDs - maybe prepend zeros when concatenating to make all ID's of the same length. Maybe:

=TEXT(A2,"0000")&B2  

and then strip these off later with:

=LEFT(A2,4)  

and extract the rest with:

=MID(A2,5,5)
Community
  • 1
  • 1
pnuts
  • 58,317
  • 11
  • 87
  • 139