0

I have a dataframe which has a head(mydataframe) output as below. I want to extract name column (miRNA names) according to target column (ENST) in such a way all occurrences of miRNA are given without repeating ENST ids twice or more.

Each column is a miRNA , each row is ENST (occurring once for each specific ENST) and cells ones or zeroes according to the presence of miRNAs.

               name          target  chrom     start       end strand
2928 hsa-miR-576-5p ENST00000324219      2 101875410 101875431      +

2929 hsa-miR-483-5p ENST00000324219      2 101876861 101876882      +

3047  hsa-miR-302c* ENST00000264258      2 100989915 100989939      +

3048 hsa-miR-767-3p ENST00000264258      2 100990020 100990039      +

3049   hsa-miR-216a ENST00000264258      2 100989887 100989906      +

3050 hsa-miR-409-3p ENST00000264258     2 100990172 100990194      +
Matias Andina
  • 4,029
  • 4
  • 26
  • 58
Alperen Taciroglu
  • 351
  • 1
  • 5
  • 15
  • 1
    I don't understand, you want to extract the miR according to what in target column? You want to enter a specific target and be given the miR name? – Matias Andina Jul 28 '15 at 12:53
  • Also, dput your data for reproducible example. I don't think that is going to be necessary this time but for next time – Matias Andina Jul 28 '15 at 12:55
  • thanks for response. name column holds miRNA names, target column holds ENST ids. I wanna tweak this dataframe in such a way that each column is a miRNA, each row is ENST ids and cells point out occurrences of miRNA for each ENST id. ENST ids are in duplicates in original dataframe as shown. I want to remove all duplicates of ENSTs because only one per specific ENST row should suffice in showing how many different miRNAs correspond to ENSTs – Alperen Taciroglu Jul 28 '15 at 13:05
  • Ok it wasn't clear. If you put your rows miR names as columns and delete your duplicate ENST you'll have empty values that are going to be more difficult to deal with. what is exactly your purpose? I don't mean to be rude, please provide an expected output. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Matias Andina Jul 28 '15 at 13:18
  • I want to obtain a dataframe that shows : Which miRNAs are binding to which ENSTs.. For example : miRNA number 100 is binding to ENST number 15 , miRNA number 99 is binding to ENST number 23. So my expected output is a dataframe which holds miRNA names in columns and ENST names in rows. Obviously, each rowname and column name is unique. And cells of the dataframe are either 1s or 0s showing whether a specific miRNA is binding to specific ENST. Putting 1 in the cell if miRNA is binding to ENST or 0 if miRNA is not binding to ENST. Hope this is clear, let me know if not. Thanks – Alperen Taciroglu Jul 28 '15 at 13:29

1 Answers1

1

You can just use table. Given your data, try:

table(data[,c("target","name")])

Output:

                hsa-miR-216a hsa-miR-302c* hsa-miR-409-3p hsa-miR-483-5p hsa-miR-576-5p
ENST00000264258            1             1              1              0              0
ENST00000324219            0             0              0              1              1
                hsa-miR-767-3p
ENST00000264258              1
ENST00000324219              0

You can store it in a dataframe by doing:

res<-as.data.frame.matrix(table(data[,c("target","name")]))
NicE
  • 21,165
  • 3
  • 51
  • 68