1

I want to check if a row in data.frame() is a duplicate of an existing row. As already pointed out here one way might to be to use the duplicate function. However, if I'm using the function I get the following error:

Error: argument 'incomparables != FALSE' is not used (yet)

In a quite old mail somebody pointed out that this is actually a bug in R (more information over here). My data.frame() looks like this:

data.frame(val1=int,val2=int,val3=int,val4=float);

I'm wondering what the issue actually is since there seems to be no "NA" value in my data.frame, as

?duplicate

points out. This is maybe a very stupid question, but I'm quite new to R and would be glad for any tips regarding this issue!

Thanks in advance, Michael

P.S.: I've provided an example as suggested

table <- NULL;

foo <- function(n, d, nh, v){
  newEntry <- data.frame(node_i=n, node_j=nh, dst=d, phi=v);

  if(length(table != 0)){
    if(!duplicated(table, newEntry)){
      add(n, nh, d, v);
    }else{
      print("it is a duplicate!")    
    }
  }else{
    add(n, nh, d, v);
  }
}

add <- function(n, d, nh, v){
  rbind(table, data.frame(node_i=n, node_j=nh, dst=d, phi=v)) ->> table;
}

bar <- function(){
  foo(23,42,5,4.0);
  print(table);
  foo(22,42,5,4.0);  
  print(table);
  foo(23,42,5,4.0);
  print(table);
}

However, this seems not to be a problem with duplicate() at all. I get the same error if I try to add another row sigh.

Community
  • 1
  • 1
  • 2
    can you give a sample of your data using `dput` and show the code that you have tried thus far? – Justin Jul 27 '12 at 17:17
  • As a side comment, I would personally not build the table by referring to a global table, using `->>`. Using such a global variable makes it harder to follow the flow of the program. In addition, if this code is integrated with other code there might be side effects, for example if other code also tries to change the table. Also growing such a table using rbind could potentially be veeeery slow. – Paul Hiemstra Jul 28 '12 at 07:26
  • Well, the code will not be used in another R program. However, I'm wondering what data structure you recommend (within respect to the speed issues)? –  Jul 28 '12 at 20:26

1 Answers1

0

If you replaceduplicated function with match_df from plyr, the issue should be resolved.

library(plyr) # for match_df
table <- NULL;

foo <- function(n, d, nh, v){
  newEntry <- data.frame(node_i=n, node_j=nh, dst=d, phi=v);

  if(length(table != 0)){
    if(nrow(plyr::match_df(table, newEntry))){
      add(n, nh, d, v);
    }else{
      print("it is a duplicate!")    
    }
  }else{
    add(n, nh, d, v);
  }
}

add <- function(n, d, nh, v){
  rbind(table, data.frame(node_i=n, node_j=nh, dst=d, phi=v)) ->> table;
}

bar <- function(){
  foo(23,42,5,4.0);
  print(table);
  foo(22,42,5,4.0);  
  print(table);
  foo(23,42,5,4.0);
  print(table);
}

Output

> bar()
node_i node_j dst phi
1     23     42   5   4
Matching on: node_i, node_j, dst, phi
[1] "it is a duplicate!"
node_i node_j dst phi
1     23     42   5   4
Matching on: node_i, node_j, dst, phi
[1] "it is a duplicate!"
node_i node_j dst phi
1     23     42   5   4
> table
node_i node_j dst phi
1     23     42   5   4                  
discipulus
  • 2,665
  • 3
  • 34
  • 51