Removing duplicate in wordcloud in r

Question

I am generating a word cloud of my tweets. But the problem is i am getting duplicate like shown below which are treated as separated character in my word cloud instead of one.

1) myname

2) "myname

3) myname"

My other problem is i am also getting some symbols in the word cloud like ^ ~ etc. How to get rid of these symbols

@docendodiscimus answer solved my problem but I am getting now meaning words in my cloud like 'sadi24', 'yu1' etc even I though I removed Hashtags and @ words ? how can i get rid of them?

this is the output where i can identify this is happening but may be there are many other words that may be suffering from this problem . please provide your thoughts on this.

Please note that I may have numerous similar kind of issue. Please provide solution to which i can easily generalize to all others

I am providing a screen shot of other data having the problem

Here I am getting words such as manager185878 and sadi24. You can see the output with some absurd symbol even after removing the Punctuation.

You could do something like `unique(gsub("[[:punct:]]", "", x))` but the rules are not clearly defined in your question — talat, Jun 22 '16 at 13:35
@docendodiscimus I update the question. please provide solution of second problem also if you can — learner, Jun 22 '16 at 13:40
If you'd tried docendodiscimus's suggestion, you probably would have noticed that it also solves your 2nd problem. Besides that, please read (1) [how do I ask a good question](http://stackoverflow.com/help/how-to-ask), (2) [How to create a MCVE](http://stackoverflow.com/help/mcve) as well as (3) [how to provide a minimal reproducible example in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example#answer-5963610). Then edit and improve your question accordingly. Otherwise people talk past each other. — lukeA, Jun 22 '16 at 13:51
@lukeA ok will read your suggestion. Now my problem has been resolved but i am getting some meaning words in my cloud as I removed hashtages and @ words from my tweets . Can you please explain why this is happening — learner, Jun 22 '16 at 13:53
*"Generalize to all others"* is a grand request, especially considering (1) you haven't provided sample data, (2) you haven't provided code, (3) it seems you didn't really fully try a suggested fix, from which I infer that (4) you haven't read the links suggested by @lukeA. Please be clear/explicit: provide a small example dataset and relevant code that are evidencing the problem. — r2evans, Jun 22 '16 at 14:12
Have you tried `grep`? Using it in concert with @docendodiscimus's comment (and a little more) might be fruitful. — r2evans, Jun 22 '16 at 14:36
If each element of your vector is a single word, and you want to completely omit words that are such things as usernames (`manager185878`) or such, then when you find something that meets your "cut" criteria, remove it completely. If you expect no numbers in your words, then this criteria might be as simple as `vec <- vec[ ! grepl("[0-9]", vec)]`. — r2evans, Jun 23 '16 at 15:04

Removing duplicate in wordcloud in r

0 Answers0