Preserve content while displaying data columnwise from MongoDB

Question

Reading data from twitter and then saving it in MongoDB

 data.list <- searchTwitter('#demonetization ', n=10)
 data.df = twListToDF(data.list)
 temp=mongo.bson.from.df(data.df)
 mongo <- mongo.create()
 DB_Details <- paste(twitter, "filterstream", sep=".")
 mongo.insert.batch(mongo, DB_Details, temp)

Reading the data in MongoDB and saving it in dataset variable(all columns of table are stored in this variable).

 mongo <- mongo(db = "twitter",collection = "filterstream",url = "mongodb://localhost")
 dataset <- mongo$find()

When i try printing the content of dataset variable there is no problem(See OUTPUT-1), but when i try to print a column from dataset variable the output of column(See OUTPUT-2) differs from the previous output(OUTPUT-1).

OUTPUT1

  > **dataset**    

   --------------------------------------------------
    | id        | text              |
    --------------------------------------------------
    | 1         | <ed><U+00A0><U+00BD><ed><U+00B8><U+0082><ed><U+00A0><U+00BD>
                   <ed> <U+00B8>               <U+0082><ed><U+00A0><U+00BD>
                   <ed> <U+00B1><U+0087>\nSome great jokes on #DeMonetization on 
                   my   TL today.\n\nThank you, Modi ji. <ed><U+00A0><U+00BD> 
                   <ed><U+00B1><U+0087>  |
    --------------------------------------------------
    | 2         | should be one              |
    --------------------------------------------------

OUTPUT-2

 > **dataset$text**

    | id        | text              |
    --------------------------------------------------
    | 1         | \xed��\xed�\u0082\xed��\xed�\u0082\xed��\xed�\u0087\nSome great jokes on #DeMonetization on my TL today.\n\nThank you, Modi ji. \xed��\xed�\u0087  |
    --------------------------------------------------
    | 2         | should be one              |
    --------------------------------------------------

4.Detecting these weird characters in OUTPUT-2 and getting rid of them is difficult. I am able to remove special characters(tags) and obtain clean text using REGEX for content of text column in OUTPUT-1, but the content of text column in OUTPUT-2 is quite different and i am not able to remove those special weird characters.

5.Why the content suddenly changes while printing a particular column from dataset, what am i doing wrong.

May be of interest. http://stackoverflow.com/questions/25468716/convert-byte-encoding-to-unicode/25531299#25531299 — hwnd, Dec 02 '16 at 20:39

Preserve content while displaying data columnwise from MongoDB

0 Answers0