0

I am generating a word cloud of my tweets. But the problem is i am getting duplicate like shown below which are treated as separated character in my word cloud instead of one.

1) myname

2) "myname

3) myname"

My other problem is i am also getting some symbols in the word cloud like ^ ~ etc. How to get rid of these symbols

@docendodiscimus answer solved my problem but I am getting now meaning words in my cloud like 'sadi24', 'yu1' etc even I though I removed Hashtags and @ words ? how can i get rid of them?

this is the output where i can identify this is happening but may be there are many other words that may be suffering from this problem . please provide your thoughts on this.

Please note that I may have numerous similar kind of issue. Please provide solution to which i can easily generalize to all others

I am providing a screen shot of other data having the problem enter image description here

Here I am getting words such as manager185878 and sadi24. You can see the output with some absurd symbol even after removing the Punctuation.

learner
  • 828
  • 2
  • 19
  • 36
  • 2
    You could do something like `unique(gsub("[[:punct:]]", "", x))` but the rules are not clearly defined in your question – talat Jun 22 '16 at 13:35
  • thanks let me try it.I used tm_mp(corpus,removePunctuation) – learner Jun 22 '16 at 13:38
  • @docendodiscimus I update the question. please provide solution of second problem also if you can – learner Jun 22 '16 at 13:40
  • If you'd tried docendodiscimus's suggestion, you probably would have noticed that it also solves your 2nd problem. Besides that, please read (1) [how do I ask a good question](http://stackoverflow.com/help/how-to-ask), (2) [How to create a MCVE](http://stackoverflow.com/help/mcve) as well as (3) [how to provide a minimal reproducible example in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example#answer-5963610). Then edit and improve your question accordingly. Otherwise people talk past each other. – lukeA Jun 22 '16 at 13:51
  • @lukeA ok will read your suggestion. Now my problem has been resolved but i am getting some meaning words in my cloud as I removed hashtages and @ words from my tweets . Can you please explain why this is happening – learner Jun 22 '16 at 13:53
  • 1
    *"Generalize to all others"* is a grand request, especially considering (1) you haven't provided sample data, (2) you haven't provided code, (3) it seems you didn't really fully try a suggested fix, from which I infer that (4) you haven't read the links suggested by @lukeA. Please be clear/explicit: provide a small example dataset and relevant code that are evidencing the problem. – r2evans Jun 22 '16 at 14:12
  • @r2evans Please check i have added the screenshot – learner Jun 22 '16 at 14:21
  • Have you tried `grep`? Using it in concert with @docendodiscimus's comment (and a little more) might be fruitful. – r2evans Jun 22 '16 at 14:36
  • @r2evans grep instead of gsub ? – learner Jun 23 '16 at 05:54
  • If each element of your vector is a single word, and you want to completely omit words that are such things as usernames (`manager185878`) or such, then when you find something that meets your "cut" criteria, remove it completely. If you expect no numbers in your words, then this criteria might be as simple as `vec <- vec[ ! grepl("[0-9]", vec)]`. – r2evans Jun 23 '16 at 15:04

0 Answers0