0

I have a data frame that shows a column of US states that I merged from 2 other data frames. Some of the states are duplicated and I want to remove the duplicates. However, when I use the code duplicated(df) it shows false for the whole list. All of the names that are duplicated are spelled the same way. What could be causing this error?

there is only one column in the data frame.

state
<chr>
Alabama             
Alabama             
Alaska              
Arizona             
Arizona             
Arkansas                
California              
California              
Colorado                
Connecticut

This just returns all of the rows n = m[duplicated((m$state))]

acylam
  • 18,231
  • 5
  • 36
  • 45
jclabrat
  • 35
  • 6
  • 1
    It's very unclear exactly what your data might look like. It would be much easier to help you if you provided a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data that clearly demonstrates the problem. `duplicated()` will check rows across all columns. Are you just trying to check one of the columns? – MrFlick Nov 21 '17 at 20:37
  • there is only one column in the data frame. I edited the question – jclabrat Nov 21 '17 at 20:47
  • 2
    try `unique(df$state)` – sweetmusicality Nov 21 '17 at 20:48
  • 2
    thanks that helped. I found there was a space after the duplicates. i.e showing "Alabama" and "Alabama ". – jclabrat Nov 21 '17 at 20:51
  • 2
    You can use `trimws` to remove white space to prevent that issue – tbradley Nov 21 '17 at 21:08
  • thanks, i used `trims = function (df) gsub("^\\s+|\\s+$", "", df)` – jclabrat Nov 21 '17 at 21:18

0 Answers0