-1

I have a data frame of proteins with their localization that looks like this:

Protein_loc <- data.frame(
Pro_ID = c("Palid", "Tars", "Palid", "Eef2", "Actn1", "Tars"),
Loc = c("Actin cyto", "Actin cyto", "Axon", "Aggresome", "Cell junc", "Cell junc"))

And, I would like to merge and concatenate it into a data frame that looks like this:

Subcell_loc <- data.frame(
Loc = c("Actin cyto", "Axon", "Aggresome", "Cell junc"),
Pro_ID = c("Palid, Tars", "Palid", "Eef2", "Actn1, Tars"))

I can do this in Excel rather easily with the concatenate function, but I can't find a way to do this in R.

Any help would be much appreciated, thanks.

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
jtov
  • 1
  • 2
  • I tried paste, merge, melt and cast and R pivot tables. Getting counts per location is no problem. But getting proteins per location in one cell has been difficult. Thanks. – jtov Jun 02 '16 at 19:53

1 Answers1

1

Welcome to using R. It looks like you just don't know which function you should use. We can use aggregate:

Subcell_loc <- aggregate(Pro_ID ~ Loc, Protein_loc, paste, sep = ", ")

Output is

         Loc      Pro_ID
1 Actin cyto Palid, Tars
2  Aggresome        Eef2
3       Axon       Palid
4  Cell junc Actn1, Tars

This will apply function paste(, sep = ", ") to all Pro_ID by group Loc. You can learn more on aggregate from ?aggregate. The Pro_ID ~ Loc is a formula, where the left hand side depends on the right hand side. You can learn more from ?formula. It is particularly useful for building statistical models.

The function paste() is used for concatenating strings. Try:

paste("abc", "def", sep = ", ")

It concatenates string "abc" with "def", with a separator ", " in the middle. You can also try sep = " * ".

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248