I am trying to create a new data frame TopWords
from an existing one. The original data frame data_to_export
has too much many words(bios
), and I would like to only keep words (bios
)that were used frequently, but I also need to keep the ID numbers associated with each word.
This is what I've come up with, but it doesn't work. It doesn't like the if
conditional statement, but I don't know how else to do it.
TopWords<- data_to_export if freq_terms(data_to_export$bios2 > 4)
I would like to end up with the same data from data_to_export
, but only the data for cases that have words that occur fives times or more.
For example,
data_to_export (original data)
ID bios2
1 i
1 love
1 playing
1 soccer
2 i
2 am
2 a
2 teacher
2 mom
2 grandma
2 sister
3 i
3 think
3 soccer
3 is
3 the
3 best
4 soccer
4 player
5 i
5 like
5 soccer
5 i
5 could
5 play
5 soccer
5 all
5 day
New data frame:
1 i
1 soccer
2 i
3 i
3 soccer
4 soccer
5 i
5 soccer
5 i
5 soccer
Any help would be greatly appreciated. Thanks!