-1

I searched for a solution on Google for hours, I sincerely apologize if its simple one line code and I missed it. I basically want to group together identical values in different columns on every row here;

Sample data as per Maurits' suggestion

       event_1 event_2 event_3 event_4 event_5 event_6 event_7 event_8 event_9 event_10
seq_1      200     211     114     117     118     146                                 
seq_2      200     211     114     117     118     146                                 
seq_3      200     243     211     101     114     117     118     146                 
seq_4      200     211     114     117     118     146                                 
seq_5      200     243     211     101     114     117     118     146                      

Expected output like this;

           Column_211      Column_101
seq_1             1         0
seq_2             1         0
seq_3             1         1
seq_4             1         0
seq_5             1         1
Efe
  • 179
  • 1
  • 11
  • Not clear. Why are there only three rows in your expected output? Posting a screenshot of your data is not useful. Please see [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on how to give a reproducible minimal example, *including sample data and expected output*. Also, please see [here](https://stackoverflow.com/help/how-to-ask) on how to ask a good question on SO. – Maurits Evers Dec 07 '17 at 23:34
  • @MauritsEvers Hi Maurits, just to explain how I want new column to look like. – Efe Dec 07 '17 at 23:36
  • Please take a minute or two to go through the links I give in my first comment; then come back and revise your qestion; you need to provide sample data (use `dput`), clearly explain what you'd like to do, and show your expected output. – Maurits Evers Dec 07 '17 at 23:38
  • @MauritsEvers Thank you so much for your guidance, re-organized the data, just learned a new function. – Efe Dec 07 '17 at 23:41
  • It's still not clear what you want to do. How do you get the values in `Column_211` and `Column_114`? Why only five rows? Why only values `211` and `114`? Do you look for identical values only in one `event` column or across all columns? What about other values, e.g. `118` is repeated in `event_5`? – Maurits Evers Dec 07 '17 at 23:41
  • Ok I'm not sure about your question now. Why cant it be just for `211` and `114`? I dont need other ID numbers, I'm interested in keeping all the rows, check each column on to see if `211` or `114` exists and have new columns with them in it. Was I able to clarify it this time? – Efe Dec 07 '17 at 23:47
  • So is your expected output a `dataframe` with two columns, `Column_211` containing a vector with all `211` entries *across the entire source `dataframe`*, and `Column_114` containing a vector with all `114` entries *across the entire source `dataframe`*? – Maurits Evers Dec 07 '17 at 23:51
  • Yes I think I managed to explain myself this time. `211` and `114` exists on every row so I changed `114` to `101`, which exists on 2 row only. Would this make more sense now? – Efe Dec 07 '17 at 23:57
  • You say "yes" to my comment but then continue to say something completely different to what I said. Do you want to **count the number of occurrences per row** of the values `211` and `101`? Or do you simply want to record whether the numbers are present (`1`) or not (`0`)? Either way, that's very different from what I said and understood. – Maurits Evers Dec 08 '17 at 00:01
  • No I do not want to count number of occurrences, I want new column to have 0 or 1 depending on whether that number exists in any of the 10 existing columns. – Efe Dec 08 '17 at 00:03
  • Please take a look my answer below. – Maurits Evers Dec 08 '17 at 00:07

1 Answers1

0

Is this what you're after?

Explanation: We use apply(df, 1, ...) to process df row by row, then use %in% to flag whether any of the values are present in that row. The resulting logical vector (like c(TRUE, FALSE)) is then converted to a numeric vector (like c(1, 0)). Finally, we need to transpose the resulting matrix using t(...), and give column names in accordance with your expected outcome.

 values <- c(211, 101);
 df.new <- t(apply(df, 1, function(x) as.numeric(values %in% x)));
 colnames(df.new) <- paste0("Column_", values);
 df.new;
 #      Column_211 Column_101
 #seq_1          1          0
 #seq_2          1          0
 #seq_3          1          1
 #seq_4          1          0
 #seq_5          1          1

Sample data

 df <- read.delim(text =
     "  event_1 event_2 event_3 event_4 event_5 event_6 event_7 event_8 event_9 event_10
 seq_1      200     211     114     117     118     146
 seq_2      200     211     114     117     118     146
 seq_3      200     243     211     101     114     117     118     146
 seq_4      200     211     114     117     118     146
 seq_5      200     243     211     101     114     117     118     146                      ", 
 header = T, row.names = 1, sep = "")
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68