So i have this dataset
str(pcol)
'data.frame': 3130486 obs. of 20 variables:
$ body : Factor w/ 1623190 levels "","' i need to... '",..: 76837 ...
$ author : Factor w/ 18164 levels "--Kai--","--sunshine--",..: 11455 6643 8117 832 ...
$ ups : int 2 7 1 1 1 1 2 4 2 1 ...
....
Making a table shows the following:
table(pcol$author):
AuthornameX AuthornameY AuthornameZ ...
148 87 102
'table' int [1:18164(1d)] 129 5 152 67 18 25 58 319 44 204 ...
- attr(*, "dimnames")=List of 1
..$ : chr [1:18164] "--Kai--" "--sunshine--" "-0---0-" "-73-" ...
So now i want to create a new dataset with just authors who are in the dataset more than 100 times.
I tried the following:
x <- subset(pcol, length(pcol$author) > 100 )
'table' int [1:2634(1d)] 129 152 319 204 157 177 198 106 144 437 ...
attr(*, "dimnames")=List of 1
..$ : chr [1:2634] "--Kai--" "-0---0-" "-Lolrax-" "-PTM-" ...
This way i limited the authors, who have numbers over 100. But now I have the problem of how to substract these authors from the original dataset.
I tried this:
> y <- subset(pcol, pcol$authors == x)
But that leaves me with a blank dataframe with 0 observations.
So: how do i change the original dataset to a new one, only with authors, who appear over 100 times?
My question is similar to this one, so potentially a duplicate. Althought the question was answered, I was not able to transfer the solution there to my problem. That is why I pose my question.