Group values in different columns to one column

Question

I searched for a solution on Google for hours, I sincerely apologize if its simple one line code and I missed it. I basically want to group together identical values in different columns on every row here;

Sample data as per Maurits' suggestion

       event_1 event_2 event_3 event_4 event_5 event_6 event_7 event_8 event_9 event_10
seq_1      200     211     114     117     118     146                                 
seq_2      200     211     114     117     118     146                                 
seq_3      200     243     211     101     114     117     118     146                 
seq_4      200     211     114     117     118     146                                 
seq_5      200     243     211     101     114     117     118     146

Expected output like this;

           Column_211      Column_101
seq_1             1         0
seq_2             1         0
seq_3             1         1
seq_4             1         0
seq_5             1         1

Not clear. Why are there only three rows in your expected output? Posting a screenshot of your data is not useful. Please see [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on how to give a reproducible minimal example, *including sample data and expected output*. Also, please see [here](https://stackoverflow.com/help/how-to-ask) on how to ask a good question on SO. — Maurits Evers, Dec 07 '17 at 23:34
@MauritsEvers Hi Maurits, just to explain how I want new column to look like. — Efe, Dec 07 '17 at 23:36
Please take a minute or two to go through the links I give in my first comment; then come back and revise your qestion; you need to provide sample data (use `dput`), clearly explain what you'd like to do, and show your expected output. — Maurits Evers, Dec 07 '17 at 23:38
@MauritsEvers Thank you so much for your guidance, re-organized the data, just learned a new function. — Efe, Dec 07 '17 at 23:41
It's still not clear what you want to do. How do you get the values in `Column_211` and `Column_114`? Why only five rows? Why only values `211` and `114`? Do you look for identical values only in one `event` column or across all columns? What about other values, e.g. `118` is repeated in `event_5`? — Maurits Evers, Dec 07 '17 at 23:41
Ok I'm not sure about your question now. Why cant it be just for `211` and `114`? I dont need other ID numbers, I'm interested in keeping all the rows, check each column on to see if `211` or `114` exists and have new columns with them in it. Was I able to clarify it this time? — Efe, Dec 07 '17 at 23:47
So is your expected output a `dataframe` with two columns, `Column_211` containing a vector with all `211` entries *across the entire source `dataframe`*, and `Column_114` containing a vector with all `114` entries *across the entire source `dataframe`*? — Maurits Evers, Dec 07 '17 at 23:51
Yes I think I managed to explain myself this time. `211` and `114` exists on every row so I changed `114` to `101`, which exists on 2 row only. Would this make more sense now? — Efe, Dec 07 '17 at 23:57
You say "yes" to my comment but then continue to say something completely different to what I said. Do you want to **count the number of occurrences per row** of the values `211` and `101`? Or do you simply want to record whether the numbers are present (`1`) or not (`0`)? Either way, that's very different from what I said and understood. — Maurits Evers, Dec 08 '17 at 00:01
No I do not want to count number of occurrences, I want new column to have 0 or 1 depending on whether that number exists in any of the 10 existing columns. — Efe, Dec 08 '17 at 00:03

Maurits Evers · Accepted Answer · 2017-12-08T00:19:07.990

Is this what you're after?

Explanation: We use apply(df, 1, ...) to process df row by row, then use %in% to flag whether any of the values are present in that row. The resulting logical vector (like c(TRUE, FALSE)) is then converted to a numeric vector (like c(1, 0)). Finally, we need to transpose the resulting matrix using t(...), and give column names in accordance with your expected outcome.

 values <- c(211, 101);
 df.new <- t(apply(df, 1, function(x) as.numeric(values %in% x)));
 colnames(df.new) <- paste0("Column_", values);
 df.new;
 #      Column_211 Column_101
 #seq_1          1          0
 #seq_2          1          0
 #seq_3          1          1
 #seq_4          1          0
 #seq_5          1          1

Sample data

 df <- read.delim(text =
     "  event_1 event_2 event_3 event_4 event_5 event_6 event_7 event_8 event_9 event_10
 seq_1      200     211     114     117     118     146
 seq_2      200     211     114     117     118     146
 seq_3      200     243     211     101     114     117     118     146
 seq_4      200     211     114     117     118     146
 seq_5      200     243     211     101     114     117     118     146                      ", 
 header = T, row.names = 1, sep = "")

Thats it! Thank you very much, I will now play around with it and try to understand the logic. — Efe, Dec 08 '17 at 00:16
No worries @Efe. Glad to help out. I've added some explanations that might help. — Maurits Evers, Dec 08 '17 at 00:19

Group values in different columns to one column

1 Answers1

Sample data