R finding duplicate values

Question

I have a table called Data with 10 columns and among them there are columns named: suitcases, columns, rows, objects.

In the suitcases column, the values go from 1 to 10.

In the columns column, the values go from 1 to 20

In the rows column, the values go from 1 to 20

The numbering of values in the objects column starts from 1 for each suitcas.

I tried the following method (a similar one appeared on the forum with a different question):

duplicates <- function(data, var)
{
  library(tidyverse)
  data |> 
    add_count(!sym(var)) |> 
    filter(n == 2) |> 
    select(-n)
}


for (x in suitcases) {
  duplicates(Data, objects)  
}

I want to get a new table in which there are only such rows in which the values for the objects column occur exactly twice and not more, taking into account the resetting of the numbering in the suitcases column and the values in the columns: columns and rows. Due to the re-numbering, repetitions may appear in subsequent suitcases (despite the same values in columns: columns and rows)

Unfortunately, I have no idea how to take into account the resetting numbering. Therefore, I am asking the forum for help and indulgence, if the question is not well-formed, I am new here.

Example_data

Example_output

structure(list(rows = c(6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 5L, 
    5L, 5L, 5L, 6L, 6L), columns = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 6L, 6L, 6L, 6L, 6L, 3L, 3L), time.min = c(5L, 0L, 5L, 0L, 
    0L, 5L, 5L, 5L, 0L, 2L, 5L, 0L, 2L, 10L, 10L), status = c(38L, 
    66L, 57L, 38L, 57L, 20L, 20L, 3L, 58L, 58L, 14L, 14L, 5L, 5L, 
    27L), postion = c(38L, 17L, 6L, 7L, 31L, 31L, 32L, 21L, 2L, 67L, 
    1L, 31L, 6L, 35L, 37L), x = c(58L, 14L, 14L, 14L, 68L, 12L, 27L, 
    448L, 981L, 860L, 147L, 417L, 884L, 417L, 884L), y = c(216L, 
    212L, 483L, 520L, 234L, 515L, 521L, 795L, 93L, 668L, 75L, 787L, 
    310L, 827L, 144L), z = c(38L, 66L, 57L, 38L, 57L, 20L, 20L, 1L, 
    7L, 6L, 981L, 147L, 781L, 417L, 884L), suitcases = c(3L, 3L, 
    3L, 2L, 7L, 7L, 7L, 7L, 5L, 1L, 4L, 3L, 3L, 10L, 10L), objects = c(6L, 
    1L, 6L, 22L, 5L, 14L, 27L, 14L, 1L, 14L, 1L, 26L, 5L, 4L, 4L)), class = "data.frame", row.names = c(NA, 
    -15L))

Please provide some sample data from your `Data` table, you can use `dput(Data)` output if it's not too big, or select relevant columns and pick few rows to make it more compact. Desired output wouldn't hurt either. — margusl, Jun 26 '22 at 09:22
Assuming your table is named `Data`, run `dput(Data)` in RStudio and include resulting `structure(list ( ... ` to your question as a code block, this allows others to work with your table. Also - https://stackoverflow.com/a/5963610/646761 — margusl, Jun 26 '22 at 10:02
Is it correctly added? I only give sample data because the whole table is big. — , Jun 26 '22 at 10:15

score 1 · Accepted Answer · answered Jun 26 '22 at 10:41

You can approach this though grouping and filtering, just note that your expected output is bit unclear about group order (or resetting the numbering, as you put it), meaning that different order can provide the same result on provided sample, but on your real dataset you might expect something else :

library(dplyr)
Data %>% 
  group_by(rows,columns,suitcases,objects) %>%
  filter (n() == 2) %>%
  ungroup()

Result:

#> # A tibble: 4 × 10
#>    rows columns time.min status postion     x     y     z suitcases objects
#>   <int>   <int>    <int>  <int>   <int> <int> <int> <int>     <int>   <int>
#> 1     6       3        5     38      38    58   216    38         3       6
#> 2     6       3        5     57       6    14   483    57         3       6
#> 3     6       3       10      5      35   417   827   417        10       4
#> 4     6       3       10     27      37   884   144   884        10       4

Thank you very much for your answer. As for the group order, the umbering of values in the objects column starts from 1 for eeach suitcase. — , Jun 26 '22 at 10:51
And the values in the columns: columns and rows can be repeated in the subsequent suitcases — , Jun 26 '22 at 11:05
@user19418711 , by group order I meant parameter order of `group_by(),` with that sample data I can also use `group_by(objects,suitcases,rows)`and still get the same result, with your actual dataset the group order might make a difference. — margusl, Jun 26 '22 at 11:20

R finding duplicate values

1 Answers1