0

Currently, I have 16383, 43 dimension data. It looks like this:

Response0me      ReleaseDate            date      MicrosoftWindows   PlayStation4  ………
Prison Architect 2015-10-06 0:00 2015-10-07 0:00  2015-10-06 0:00 2016-06-28 0:00
Prison Architect 2015-10-06 0:00 2015-10-08 0:00  2015-10-06 0:00 2016-06-28 0:00
Prison Architect 2015-10-06 0:00 2015-10-09 0:00  2015-10-06 0:00 2016-06-28 0:00
TIS-100          2015-07-20 0:00 2015-07-21 0:00                  2015-07-20 0:00
TIS-100          2015-07-20 0:00 2015-07-22 0:00                  2015-07-20 0:00
TIS-100          2015-07-20 0:00 2015-07-23 0:00                  2015-07-20 0:00

As you see, for each Response0me, there is one ReleaseDate, MicrosoftWindows, PlayStation4, etc, but there are many date. So I wanna see this dataset as below:

Response0me      ReleaseDate      MicrosoftWindows  
Prison Architect 2015-10-06 0:00  2015-10-06 0:00 
TIS-100          2015-07-20 0:00                  

In short, I wanna drop out (not actually delete or drop, but just not be shown on my console) the meaningless datas, then abbreviate rows, and see only the selected datas. Is there any way that I can do it?

J. Joe
  • 31
  • 5

1 Answers1

0

You could use unique(df[, -3]). The -3 at the end will deselect the date variable (which is in third position), leaving you with only the variables that don't change as often. After that, unique will remove duplicated observations. If you want to exclude more variables, you can do unique(df[, c(3, ...)]).

Separately, you could use dplyr:

df %>% select(-date) %>% distinct()

cpander
  • 374
  • 2
  • 9
  • Thanks. That was really helpful. But if you would please excuse me, can I ask is there other way not by drop, but by select? In other words, can I result the above by selecting specific rows? – J. Joe Nov 30 '18 at 03:40