0

I am trying to save data from web searches using selenium web driver and C#, it all works ok, but when the text is saved some characters seems to have a wrong encoding, here is the snippet of my code:

        using (System.IO.StreamWriter file = new System.IO.StreamWriter(@fileName, true))
        for (int i = 1; i <= 10; i++)
        {
            String xpath = "/html/body/div[5]/div[2]/div/div[6]/div/div[3]/div/div[2]/div/ol/li[" + i + "]/div"; // google

            String element = driver.FindElement(By.XPath(xpath)).Text;               
            element.Replace(",", " ");
            element = '"' + " " + element + " " + '"'+",";


            {

some of the weird characters found in the file look like these ones:

› ... ›

any help would be much appreciated :)

############# resolution below

found the solution to this issue by using:

        string path = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
        using (System.IO.StreamWriter file = new System.IO.StreamWriter(@fileName, true, UnicodeEncoding.UTF8))
user2327528
  • 17
  • 1
  • 6
  • Ok I have resolved this query, this appears to be an issue in Excel... the solution would be not using CSV but a tab separated doc, this link was helpful: http://stackoverflow.com/questions/6002256/is-it-possible-to-force-excel-recognize-utf-8-csv-files-automatically – user2327528 May 18 '13 at 12:07

1 Answers1

0

By default StreamWriter will write in UTF8 format. That should be fine for XML files. How are you viewing the file?

If you need to change the encoding to something else, you must use the StreamWriter constructor that lets you specify an encoding.

Like I said, the default one should work so perhaps the viewer you're using is expecting something else.

Matthew Watson
  • 104,400
  • 10
  • 158
  • 276
  • thank you for your reply! the link you've supplied appears to be broken. I am writing onto a csv file, I have tired using: StreamReader sr = new StreamReader(element, Encoding.Unicode); but this did not work... – user2327528 May 18 '13 at 11:36
  • @user2327528 Fixed the link! How are you viewing the file to see the weird characters? And can you show more context, e.g. the surrounding text? – Matthew Watson May 18 '13 at 11:37
  • here is part of the text I am seeing: citywire.co.uk › ... › Funds and managers › Asia Pacific Excluding Japan‎ this sould be something like this: Jupiter Asian | Jupiter | Fund Fact Sheet | Citywire citywire.co.uk › ... › Funds and managers › Asia Pacific Excluding Japan‎ – user2327528 May 18 '13 at 11:41
  • Hmm it looks like either some binary data is being written into the XML incorrectly, or something somewhere is using the wrong character encoding. Unfortunately it's very difficult to tell from the code I can see. – Matthew Watson May 18 '13 at 12:04