1

I have a data.frame like data. In the column named value same values appears more than once (in more than one rows). I would like to match the rows that have the same value, in order to find their ids. In other words, I would like to have as a result that ids "P1","P3" and "P4" have the same value wich equals to 24.7386760 and the ids "P2"and "P6" has the same value that equals to 21.9178082.

I have used duplicated function to spot the duplicated values and then filter function to keep the rows with an exact value. I have tried this code:

id <- c("P1", "P2", "P3", "P4", "P5", "P6")
value <- c(24.7386760, 21.9178082, 24.7386760, 24.7386760, 20.7441860, 21.9178082)
data <- as.data.frame(cbind(id,value))

duplicates <- data$value[duplicated(data$value) | duplicated(data$value, fromLast=TRUE)]
View(duplicates)

library(dplyr)
cat1 <- filter(data,data$value == 24.7386760)
cat2 <- filter(data,data$value == 21.9178082)

Even though it can work for a small amount of different values it can not work for a lot of values, like my real problem values.

Any ideas on this? Thank you

Rea Kalampaliki
  • 124
  • 3
  • 11

2 Answers2

2

Are you looking to group like values?

split(data, data$value)

$`20.744186`
  id     value
5 P5 20.744186

$`21.9178082`
  id      value
2 P2 21.9178082
6 P6 21.9178082

$`24.738676`
  id     value
1 P1 24.738676
3 P3 24.738676
4 P4 24.738676

or maybe you prefer this output:

aggregate(id ~ value, data, paste)

       value         id
1  20.744186         P5
2 21.9178082     P2, P6
3  24.738676 P1, P3, P4

aggregate with no duplicates

aggregate(id ~ value, data[data$value %in% duplicates,], paste)

       value         id
1 21.9178082     P2, P6
2  24.738676 P1, P3, P4
Daniel O
  • 4,258
  • 6
  • 20
  • Thank you for your interest. I don't want to exclude the dublicates but I want to find the ids that have the same value. – Rea Kalampaliki Jul 08 '20 at 17:44
  • Rea Kalampaliki I think you should explain more clearly your problem because I will propose a solution really similar to Daniel O proposition – Rémi Coulaud Jul 08 '20 at 17:46
  • `split` function can work for me, but it does not exclude the unique values – Rea Kalampaliki Jul 08 '20 at 17:49
  • `aggregate` function seems to work even better. If the first row of the results, that includes a unique value, does not appear it would be even better. – Rea Kalampaliki Jul 08 '20 at 17:52
  • 1
    @ReaKalampaliki I've edited in a version, it uses the duplicates object you've created. – Daniel O Jul 08 '20 at 17:54
  • @DanielO Just for learning..How can I put the results table in data.frame where the id column is a character column ? – Rea Kalampaliki Jul 08 '20 at 18:07
  • The results table is already a `data.frame` If we store it as `x <- aggregate(... ` then check `str(x)` we will see that it is indeed a data.frame and the `id` column is of type character. – Daniel O Jul 08 '20 at 18:15
  • @DanielO `id` column isn't a list ? – Rea Kalampaliki Jul 08 '20 at 18:19
  • 1
    I see. If you want we you could collapse the lists into a single character object for each line `aggregate(id ~ value, data[data$value %in% duplicates,], paste, collapse=", ")` – Daniel O Jul 08 '20 at 18:24
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/217479/discussion-between-rea-kalampaliki-and-daniel-o). – Rea Kalampaliki Jul 08 '20 at 18:27
1

tidyverse solution without the need to identify duplicates separately:


library(dplyr)
library(stringr)

data %>% 
  group_by(value) %>%
  summarise(ids = paste(id, collapse = ", ")) %>% 
  filter(str_detect(ids, ","))

#> # A tibble: 2 x 2
#>   value      ids       
#>   <chr>      <chr>     
#> 1 21.9178082 P2, P6    
#> 2 24.738676  P1, P3, P4

Created on 2020-07-08 by the reprex package (v0.3.0)

Peter
  • 11,500
  • 5
  • 21
  • 31
  • Thank you very much. Should I install the `tidyverse` package ? – Rea Kalampaliki Jul 08 '20 at 18:11
  • 1
    No you only need the packages listed in the answer. It's just that they are from the `tidyverse` ecosystem in contrast to base r or data.table which seem to be the other general approaches to much r programming. – Peter Jul 08 '20 at 18:14