I am fairly new to r, and I am working with a large data set. I made an example of what my problem is below (data set is tab delineated). Basically I want to collapse all data by its ID number so that all of its attributes are contained in 1 cell instead of many cells.
The actual data set I am working with is genomic in nature, with the "ID" being the "gene name" and the "attribute" being the "pathway" that the gene is associated with. My data set is ~5,000,000 rows long.
I have tried messing around with cbind and rbind, but they do not seem to be specific enough for what I need.
My data set currently looks something like this:
ID Attributes
1 apple
1 banana
1 orange
1 pineapple
2 apple
2 banana
2 orange
3 apple
3 banana
3 pineapple
And I want it to look like this:
ID Attributes
1 apple,banana,orange,pineapple
2 apple,banana,orange
3 apple,banana,pineapple
If you have another way besides using r, that would work as well. Thank you for your help