1

I've got a data table with about 200 column names in it, however, I have several columns which are repeated and are exactly the same in all respects, i.e they have the same name and same entries.

I want to get rid of all but one of these duplicated columns.

Take for instance:

Code         AEE       AEE      Code      AEE    EPI       Code     AEPI
20/09/1991  4562.43 108.13  20/09/1991  2017698 60.16   20/09/1991  18309
23/09/1991  4578.89 108.52  23/09/1991  2017698 56.55   23/09/1991  18309
24/09/1991  4578.89 108.52  24/09/1991  2017698 58.36   24/09/1991  18309
25/09/1991  4631.04 109.76  25/09/1991  2017698 56.55   25/09/1991  18309
26/09/1991  4665.34 110.57  26/09/1991  2017698 58.36   26/09/1991  18309

As you can see the Code column repeats every so often.

Doing: Data[, Code := NULL] only gets rid of the first "Code" and not the others.

Ideally the output would look like:

    Code       AEE   AEE     AEE     EPI    AEPI
20/09/1991  4562.43 108.13  2017698 60.16   18309
23/09/1991  4578.89 108.52  2017698 56.55   18309
24/09/1991  4578.89 108.52  2017698 58.36   18309
25/09/1991  4631.04 109.76  2017698 56.55   18309
26/09/1991  4665.34 110.57  2017698 58.36   18309

So only the first Code column remains. Thanks!

Gin_Salmon
  • 837
  • 1
  • 7
  • 19

3 Answers3

2

Try this:

Data <- Data[, !duplicated(lapply(Data, summary))]
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Hi Tim, when i run this code it only removes rows (100 of them) and doesn't change any of the columns. Is this how the code is supposed to work? – Gin_Salmon Sep 22 '16 at 01:26
  • @Gin_Salmon I updated my answer to properly subset the data frame. – Tim Biegeleisen Sep 22 '16 at 01:51
  • i get what the code is doing now, so it should be sub setting on all the false values, i.e those that aren't duplicates. However, when i run this, Data isn't subsetted rather Data just becomes a logical vector. Any thoughts? – Gin_Salmon Sep 22 '16 at 01:56
  • @Gin_Salmon I think you copied my code incorrectly. I get a data frame when I use it. You might be missing a comma. – Tim Biegeleisen Sep 22 '16 at 02:00
  • OP has a `data.table` while you used a `data.frame`- hence the confusion- this what tags for... And I guess you've just copy/pasted this from here http://stackoverflow.com/questions/9818125/identifying-duplicate-columns-in-an-r-data-frame – David Arenburg Sep 22 '16 at 06:12
0

You can delete by the column number:

Data[, c(4,7) := NULL]   

Data
#         Code     AEE    AEE     AEE   EPI  AEPI
#1: 20/09/1991 4562.43 108.13 2017698 60.16 18309
#2: 23/09/1991 4578.89 108.52 2017698 56.55 18309
#3: 24/09/1991 4578.89 108.52 2017698 58.36 18309
#4: 25/09/1991 4631.04 109.76 2017698 56.55 18309
#5: 26/09/1991 4665.34 110.57 2017698 58.36 18309
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • I think referring to columns by their number is discouraged. – SymbolixAU Sep 22 '16 at 01:26
  • @SymbolixAU I do not know that. Do you have any reference for support? – Psidom Sep 22 '16 at 01:28
  • [this answer](http://stackoverflow.com/a/13383995/5977215) has a reference to it, and [data.table FAQ 1.1](https://rawgit.com/wiki/Rdatatable/data.table/vignettes/datatable-faq.html) – SymbolixAU Sep 22 '16 at 01:32
  • @SymbolixAU I agree it's not a good practice to use column numbers as that obfuscate your intention but the data shown here is under rare circumstances and legally should not happen. This is just an option to deal with the edge case. – Psidom Sep 22 '16 at 01:40
  • 1
    I agree, there is a time and a place, but thought it worth noting. – SymbolixAU Sep 22 '16 at 01:45
  • 1
    I think the problem with this answer that it is not programmatic. OP has 200 columns from which he wants to remove duplicated columns problematically- not by eyeballing and then removing by hand. – David Arenburg Sep 22 '16 at 06:07
0

You could also do:

df <- df[,!duplicated(names(df))]

OR

df <- df[,unique(names(df))]
989
  • 12,579
  • 5
  • 31
  • 53