0

I am using Rfacebook to extract some content from Facebook's API through R. I somehow get sometimes posts two or three times back even though they only appear 1 time in facebook. Probably some problem with my crawler. I extracted already a lot of data and don't want to rerun the crawling. So I was thinking about cleaning the data I have.

Is there any handy way with dplyr to do that?

The data I got looks like the following:

Name            message           created_time                   id

Sam             Hello World       2013-03-09T19:52:22+0000       26937808
Nicky           Hello Sam         2013-03-09T19:53:16+0000       26930800
Nicky           Hello Sam         2013-03-09T19:53:16+0000       26930800
Nicky           Hello Sam         2013-03-09T19:53:16+0000       26930800
Sam             Whats Up?         2013-03-09T19:53:22+0000       26937806
Sam             Whats Up?         2013-03-09T19:53:22+0000       26937806
Florence        Hi guys!          2013-03-09T19:55:16+0000       25688232
Steff           How r u?          2013-03-09T19:59:16+0000       64552194

I would now like to have a new data frame in which every post only appears one time so that the three "double" posts from Nicky will be reduced to only one and the two double posts from Sam also get reduced to one post.

Any idea or suggestion how to do this in R? It seems like facebook is giving unique ids to posts and comments as well as that the time stamps are almost unique in my data. Both would be working for identification. However, it remains unclear to me how to best do the transformation...

Any help with this is highly appreciated!

Thanks!

rkuebler
  • 95
  • 1
  • 11

2 Answers2

2

If you use dplyr, you could simply use distinct() (see also this topic).

Community
  • 1
  • 1
Jasper
  • 555
  • 2
  • 12
2

We can use unique with by option if the duplicates are based on particular column

library(data.table)
unique(setDT(df1), by = c("Name", "message"))

or if it is for the whole dataset, unique from base R can be used

unique(df1)
akrun
  • 874,273
  • 37
  • 540
  • 662