0

I am a R beginner and currently facing a problem I can't conceptualize for now. I have looked several related posts but have not find a specific answer except there :
Aggregating rows with same Ids and retaining only unique entries in R

but my problem is a bit different.

Here's the structure of the initial df I wanna use :

sta_RHP_metho (3528,4) the variables are :
- "code.sandre" witch is the ID i'll use
- "CodeOpera" a unique id witch is related to "code.sandre"
- "Methode.de.peche" a character vector
- "year"

In that df there's as much rows as unique "CodeOpera" (3528). There are several "CodeOpera" by id/"code.sandre" and there are 180 code.sandre

What i want to get is a df with a unique row by "code.sandre" and the "Methode.de.peche" character value for each year.

I almost got that by processing the following code :

x2<-melt(sta_RHP_metho,c("code.sandre","CodeOpera","year"),"Methode.de.peche")
x3<-as.data.frame(dcast(x2,code.sandre + CodeOpera ~ year))

But I still have several as much rows as unique "CodeOpera" (3528) and as I said I don't know how to get a unique rox by ID.
A thing to notice is that it's possible to have several "Methode.de.peche" by year so i may need to concatenate returned values in some case.

Hope my explanations are clear.

Comments will be greatly appreciated ;)

Cheers.

Tristan


Thank you @ANG. Here's minimal reproducible example:

1/The dataframe I got after my melt/dcast operation :

code_sandre<-c("A","A","A","B","B","C","D")
year1<-c("a",NA,"a","b",NA,"c","b") 
year2<-c("a","b",NA,"b","b","c","b") 
year3<-c("a","b",NA,NA,NA,"c","b")
x<-data.frame(v1 =code.sandre,v2 =year1,v3 =year2, v4 =year3))

2/The dataframe I wanna get:

code_sandre<-c("A","B","C","D")
year1<-c("a","b",NA,"b")
year2<-c("a,b","b","c","b")
year3<-c("a,b",NA,"c","b")
result<-data.frame(code_sandre,year1,year2,year3)
ChrisF
  • 134,786
  • 31
  • 255
  • 325
  • 2
    Hello Tristan and welcome to StackOverflow (SO). Could you provide a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? – nghauran Oct 21 '17 at 18:46

1 Answers1

0

I don't know if I got you right but it looks like you just want unique code.sandre no matter the value of CodeOpera. Do you get the expected result after trying this (check the result before using melt()):

library(data.table)
setDT(sta_RHP_metho)
# delete column "CodeOpera"
sta_RHP_metho <- sta_RHP_metho[, CodeOpera := NULL]
# take unique rows
library(dplyr)
sta_RHP_metho2 <- distinct(sta_RHP_metho)

OR

What I was able to achieve.

code_sandre<-c("A","A","A","B","B","C","D")
year1<-c("a",NA,"a","b",NA,"c","b") 
year2<-c("a","b",NA,"b","b","c","b") 
year3<-c("a","b",NA,NA,NA,"c","b")
x<-data.frame(code_sandre =code_sandre,
              year1 = year1,
              year2 = year2,
              year3 = year3)
library(dplyr)
x2 <- x %>%
        group_by(code_sandre) %>%
        summarise_at(.vars = vars(year1, year2, year3),
                     .funs = function(x) toString(unique(x[!is.na(x)])))
x2
x3 <- as.data.frame(x2)
x3[x3 == ""] <- NA
x3

I think it should be very close to your expected output.

nghauran
  • 6,648
  • 2
  • 20
  • 29
  • Try this to see `sta_RHP_metho2 <- as.data.frame(sta_RHP_metho2)` and then use `melt()`. – nghauran Oct 21 '17 at 21:44
  • That is almost what I want but I can't do the melt from your `sta_RHP_metho2 <- distinct(sta_RHP_metho)` and that is not the dataframe structure I want in output. Efectively I want a unique row per `"code.sandre"` and a unique column by `"year"`observed in my initial dataframe. For each junction between a row and a column, 3 possibilies : no value (because no sampling that year), 1 value (because 1 unique `"Methode.de.peche"` even if several `"CodeOpera"`in that year), several values (several `"Methode.de.peche"`). Maybe the dataframe is not appropriate in my case ? matrix ? – T.Bourgeois Oct 21 '17 at 21:49
  • Sorry for my messy comments I will improve my syntax – T.Bourgeois Oct 21 '17 at 21:49