4

I'm reading a rather large text file in VBA. The source file contains various text data from an international conference, and I'm rewriting extracted parts of that data in an organized format into an Excel sheet. I've placed the salient parts of the routine below. (I am a primitive programmer, and so I use primitive methods. Sorry about my bad form.)

The problem comes from the location of characters that aren't simple ASCII characters. The text files contain various entries, such as "München". When written to an Excel file, the text is badly mangled. In this case, I get "München". What's really annoying is that I can copy the text from the text file, and paste it into Excel, no problem. But there are way too many records to do this by hand. I really want the original (non ASCII) characters to be written on the spreadsheet!

I've spent a fair amount of time looking for a solution, and I found two related threads. First, there is something called "binaryStream" described in the first answer here: Excel VBA - export to UTF-8 It's not quite close enough to what I'm trying to do, and I can't figure out the parts I need.

There is also Save text file UTF-8 encoded with VBA which describes how to write UTF-8 data to a text file. Almost, but I want to write to an Excel spreadsheet!

Any suggestions, please?

Sub ParseText()
   Dim myFile As String, Author1 As String, Author2 As String
   Dim dept As String, inst As String, country As String
   Dim title As String, LineText As String
   Dim nSourceFile As Integer, NewRecord As Boolean
   Dim First13 As String, nn As Integer, Row As Integer, LineText2 As String

   myFile = Application.GetOpenFilename()
   nSourceFile = FreeFile
   Open myFile For Input As #nSourceFile
   Row = 2
   While Not EOF(nSourceFile)
      Line Input #nSourceFile, LineText ''This is a blank line
      Line Input #nSourceFile, LineText ''This has both first and last name
      Author1 = Right(LineText, Len(LineText) - 2)
      Author2 = RTrim(Left(Author1, Len(Author1) - 1))
      LineText = Author2
      nn = InStrRev(LineText, " ")
      Author1 = RTrim(Left(LineText, nn))
      Author2 = RTrim(Right(LineText, Len(LineText) - nn))
      nn = Len(Author2)
      LineText = LCase(Right(Author2, nn - 1))
      Author2 = Left(Author2, 1) & LineText
      Cells(Row, 1) = Author1
      Cells(Row, 2) = Author2

      Row = Row + 1
   Wend '' reached the end of file


End Sub
Community
  • 1
  • 1
Chuck
  • 41
  • 1
  • 3
  • 1
    Have you seen this possibility? http://www.ozgrid.com/forum/showthread.php?t=164547 – Daniel Dušek May 07 '14 at 10:54
  • 1
    You could also convert the source file to UTF-16LE (aka Unicode inside Windows, notepad can convert it) and then when reading inside the VBA code convert the string with `StrConv` and the `vbFromUnicode` flag – z̫͋ May 07 '14 at 11:47
  • i don't know if its ideal.. but I replace all non ascii charaters with the original characters using vba because even I did not find any solution for it. Ex: Replace ü with ü – neophyte May 07 '14 at 12:40
  • Thanks z, this was a remarkably simple solution. Inside Notepad, I save the text using ANSI encoding, and voila- the funny characters are preserved. Thanks! – Chuck May 07 '14 at 15:00

1 Answers1

0

Answer Provided in comments and accepted by OP:

"You could also convert the source file to UTF-16LE (aka Unicode inside Windows, notepad can convert it) and then when reading inside the VBA code convert the string with StrConv and the vbFromUnicode flag" – z̫͋

AMR
  • 584
  • 1
  • 6
  • 16