19

I have a UTF-8 CSV file stored on a web server. When I download the file put it on my hard drive and I then import it into an Excel sheet with this macro (from the macro recorder) :

Sub Macro2()
Workbooks.OpenText Filename:= _
    "C:/myFile.csv", Origin _
    :=65001, StartRow:=1, DataType:=xlDelimited, TextQualifier:= _
    xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=False, Semicolon:=False _
    , Comma:=True, Space:=False, Other:=False
End Sub

All of the characters (vietnamese characters) are displayed correctly.

When I try the same macro but instead of giving the local address of the file ("C:/myFile.csv") I pass the URL of the file ("http://myserver.com/myFile.csv") the CSV is correctly imported into my Excel sheet but the vietnamese characters are not displayed correctly anymore.

I have also tried using the Data tab but the encoding seems be ignored by Excel:

With ActiveSheet.QueryTables.Add(Connection:= _
                "TEXT;C:/myFile.csv" _
                , Destination:=Range("$A$1"))
                .Name = "myFile.csv"
                .FieldNames = True
                .RowNumbers = False
                .FillAdjacentFormulas = False
                .PreserveFormatting = True
                .RefreshOnFileOpen = False
                .RefreshStyle = xlInsertDeleteCells
                .SavePassword = False
                .SaveData = True
                .AdjustColumnWidth = True
                .RefreshPeriod = 0
                .TextFilePromptOnRefresh = False
                .TextFilePlatform = 65001
                .TextFileStartRow = 1
                .TextFileParseType = xlDelimited
                .TextFileTextQualifier = xlTextQualifierDoubleQuote
                .TextFileConsecutiveDelimiter = False
                .TextFileTabDelimiter = True
                .TextFileSemicolonDelimiter = False
                .TextFileCommaDelimiter = False
                .TextFileSpaceDelimiter = False
                .TextFileOtherDelimiter = "~"
                .TextFileColumnDataTypes = Array(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
                .TextFileTrailingMinusNumbers = True
                .Refresh BackgroundQuery:=False
       End With

Sample data: „; Â; ˜; Â1/4; ‰; ™,™

which Excel reads wrongly as: „; Â; ˜; Â1/4; ‰; ™,™;

user2741700
  • 881
  • 3
  • 11
  • 21
  • There is usually an encoding declaration in HTTP headers. Maybe in your case it is set to another character set and overwrite the `Origin` parameter – z̫͋ May 13 '14 at 09:04
  • Thanks, and do you know how i can change those HTTP headers ? – user2741700 May 13 '14 at 09:16
  • Depends on the server. You can easily check if that is the case with a web browser: in firefox you can activate the web console, type the URL in the address bar and look at the response header `Content-type` – z̫͋ May 13 '14 at 09:26
  • here is the content-type : text/html; charset=UTF-8 So it shouldn't be a server issue no ? thanks – user2741700 May 13 '14 at 10:10
  • No it shouldn't. Maybe Excel ignores it.. – z̫͋ May 13 '14 at 10:17
  • 1
    My testing shows that Excel gets confused when there is no Unicode byte order mark in the file. When there is one, it opens it properly from a URL, and in that case it does not matter whether the server supplies charset=UTF-8 in the headers. – GSerg Sep 01 '15 at 10:46
  • @GSerg ... I've just verified that all the files I am opening are `UTF-8`, opening those manually while recording a macro sets the `Origin:=65001`. As soon as I removed the Origin completely the files open just fine... I think this may have been an Excel's bug (either macro recorder or simply `UTF-8` has nothing to do with the Excel's `65001`... don't know but I am glad it's working after removing the Origin parameter –  Sep 03 '15 at 11:21

3 Answers3

7

If the characters are displayed correctly when you download the csv file yourself, I'd divide the process to 2 stages:

Downloading

Sub DownloadFile(ByVal url As String, ByVal local As String)

Dim WinHttpReq As Object
Set WinHttpReq = CreateObject("Microsoft.XMLHTTP")
WinHttpReq.Open "GET", url, False, "username", "password"
WinHttpReq.send

myURL = WinHttpReq.responseBody
If WinHttpReq.Status = 200 Then
    Set oStream = CreateObject("ADODB.Stream")
    oStream.Open
    oStream.Type = 1
    oStream.Write WinHttpReq.responseBody
    oStream.SaveToFile local, 2 
    oStream.Close
End If

End Sub

Loading CSV

Sub OpenCsv(ByVal csvfile As String)
Workbooks.OpenText Filename:= _ 
csvfile,Local:=True,StartRow:=1, DataType:=xlDelimited, TextQualifier:= _
xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=False, Semicolon:=False _
, Comma:=True, Space:=False, Other:=False
End Sub

Note That: The Local parameter is the key here,it makes VBA use your excel's local configuration (vietnamese), which is by default set to False.

Putting it all together

Sub DownloadAndLoad
  DownloadFile "http://myserver.com/myFile.csv","C:\myFile.csv"
  OpenCsv "C:\myFile.csv"
End Sub
Uri Goren
  • 13,386
  • 6
  • 58
  • 110
  • 1
    Hi Uri. Thanks for your reply. In my case, the `Local:=True` doesn't change a thing (is it cause my locals are UK? Changing my locale isn't really an option as I will be processing files of many languages...). For example, the csv file has a `™` in it, when I use the `Workbooks.Open` the character is still loaded as `„¢`. I have noticed that when I do this manually (not via macro) the csv is loaded properly. I have recorded a macro from the manual steps but the macro behaviour is different. I have read that csv files may be missing UTF-8 heading - are you familiar with headings at all? –  Sep 02 '15 at 07:15
  • 2
    `Local` should not be used or have an effect when reading UTF-8. @Meehow Yes, putting a [UTF-8 BOM](http://stackoverflow.com/q/4348802/11683) in the file [fixes it](http://stackoverflow.com/questions/23626622/vba-importing-utf-8-csv-file-from-a-web-server#comment52533489_23626622). – GSerg Sep 02 '15 at 08:14
7

I have been looking at a similar problem where we import utf-8 encoded csv files in to a worksheet. I am not pulling the data from a web server but this might help.

My solution is to read the utf-8 file to a local variable then insert it into a sheet. I tried saving the data to a temp file with ansi encoding but doing this caused all the characters to lose their accents.

Function ReadUTF8CSVToSheet(file As String)
    Dim ws As Worksheet
    Dim strText As String

    ' read utf-8 file to strText variable
   With CreateObject("ADODB.Stream")
        .Open
        .Type = 1  ' Private Const adTypeBinary = 1
        .LoadFromFile file
        .Type = 2  ' Private Const adTypeText = 2
        .Charset = "utf-8"
        strText = .ReadText(-1)  ' Private Const adReadAll = -1
    End With

    ' parse strText data to a sheet
    Set ws = Sheets.Add()
    intRow = 1
    For Each strLine In Split(strText, chr(10))
        If strLine <> "" Then
            With ws
                .Cells(intRow, 1) = strLine
                .Cells(intRow, 1).TextToColumns Destination:=Cells(intRow, 1), DataType:=xlDelimited, _
                    TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=False, _
                    Semicolon:=False, Comma:=True, Space:=False, Other:=False
            End With

            intRow = intRow + 1
        End If
    Next strLine

    ReadUTF8CSVToSheet = ws.Name

End Function

' to run
strSheetName = ReadUTF8CSVToSheet("C:\temp\utf8file.csv")
nathansclone4
  • 91
  • 1
  • 6
4

IMO, there seems to be a bug/conflict in Excel when opening UTF-8/UTF-8-BOM files using the recorded macro code, specifically when the Origin parameter is set to 65001 which is supposed be UTF-8.

I have found two workarounds to this issue:

  1. Remove the Origin parameter from the function call and see if the file loads properly Workbooks.OpenText Filename:="C:\file.csv".

    MSDN says:

    If this argument is omitted, the method uses the current setting of the File Origin option in the Text Import Wizard.

    I would think that as soon as you link the file with Excel, it should try to read the header of the file and select the correct Country Code automatically (well, assuming the header is not missing).

  2. I have tried different Country Codes and found that in my specific scenario setting Origin:=1252 (1252 - windows-1252 - ANSI Latin 1; Western European (Windows)) loads the file in Excel just fine.