VBS File Read: Change CR FF to CR LF

Question

I have txt and csv files to be read per line. A line ends with CR-LF. But in some files, there is CR and no LF; instead, next physical line starts with FF. UFT 12 reads them together, as if it was one line. I read files using fso:

Dim FileRead : Set FileRead = fso.OpenTextFile(file1)
Dim file2Read : Set file2Read = fso.OpenTextFile(file2)

FileStrR = FileRead.ReadLine
File2StrR = FileRead.ReadLine

I need to compare each line of these file with another text file:

if FileStrR = File2Str Then...

I tried to separate the FileStrR as array:

FileStrA = REPLACE(FileStrR, ChrW(12),"**")
strarray = split(FileStrA,"**")
For h = 0 to UBound(strarray)
    FileStr = strarray(h)
    if FileStr = File2Str Then...
...

But here I stuck to read next line from the File2 to compare with whatever comes after FF.

UPDATE Tried to SkipLine:

Do Until fileRead.AtEndOfStream
ln=ln+1
FileStrA = REPLACE(FileStrR, ChrW(12),"**")
strarray = split(FileStrA,"**")
For h = 0 to UBound(strarray)
    FileStr = strarray(h)
    For s=1 to (ln+h)-1
       File2Read.SkipLine
    Next

print ln&"-"&ln+h&"-"&h

    File2Str = File2Read.ReadLine
    if FileStr1 = File2Str Then...
print "F1: "&FileStr
print "F2: "&File2str
    Next
    Loop

In this peace of code, the line print ln&"-"&ln+h&"-"&hprints correct numbers (ln should be the number of the line currently read). But the string print (print "F1: "&FileStr & VBNewLine & "F2: "&File2str)gives the following:

F1: 2|8122|TX|...

F2: 4|8123|FG|...

It seems even if ln+h is 'ln' while 'h' is 0, but the fso skips one more line.

I open it in Notepad++, also printed what UFT reads. For those lines (it is like 1 in 65 lines), it is as follows: `...` `Line 64 Text CR-LF` `Line 65 Text CR` `FF Text CR-LF` `Line 67 Text CR-LF` `...` — Salek, Nov 10 '17 at 18:28
It says UTF-8. Tried to convert to ANSI; it added a `LF` after `CR`. Problem is, ideally I cannot and have no access to redefine encoding or save file as new file with other encoding that it has originally (( — Salek, Nov 10 '17 at 18:37
Something is not right with your file encoding, that's for sure. What does a hex editor show? Can you include that in your question for both files? Also show the code that reads the file (the definition of `FileRead`) — Tomalak, Nov 10 '17 at 18:50

score 2 · Accepted Answer · answered Nov 10 '17 at 19:10

2

See this to learn that you can't use

the FileSystemObject to read/write UTF-8
.ReadLine if the EOLs are messed up (not CrLf or Lf)

If your files are ANSI/UTF-16 and not to big, you can use

.ReadAll to slurp the 'bad' file
Replace CrFF with CrLf
Split on CrLf to get an array of lines
these lines to compare to the .ReadLines from the 'good' file

If .ReadAll is not possible, you must write your own version of .ReadLine that scans for CrLf or CrFF and returns the data before those EOLs.

answered Nov 10 '17 at 19:10

Ekkehard.Horner

38,498
2
45
96

Files are UTF-8, and their size vary from couple KB to couple dozens MB...((( By the way, FSO reads any file with not messed up EOLs really well, dependless of the size ) – Salek Nov 10 '17 at 19:18
@Salek if you are using FSO to read a UTF-8 file, that is your problem. While it may “appear” to read it right what you end up with is a encoding mismatch which will come back to bite you. You need to stop assuming you know best and heed the advice being given. – user692942 Nov 10 '17 at 19:25
1

@Salek - My answers has the link (chain) to the docs that unequivocally state that FSO is **NOT** for UFT-8. So you are like the man jumping from a high building who still grins when passing the 5th floor - even worse, you invite other people to jump after you. – Ekkehard.Horner Nov 10 '17 at 19:31
Cannot change anything in original file, but with ReadAll replaced FF to LF and saved as new file with ANSI. Now, ReadLine works )) – Salek Nov 10 '17 at 20:49
@Salek fine, it works you go do it your way. What you are failing to understand is due to the way UTF-8 is organised some common character will map fine, but they are still UTF-8 encoded. So by reading them as ASCII you will end up with mismatched encoded characters, it is that simple. But hey, you know better then us right. – user692942 Nov 10 '17 at 21:19
@Lankymart, I got that point) But so far found no option to read files by other method but FSO readline - thats lack of my knowledge, need more research. Unfortunately, at the moment have no time for that due to deadline. For now, this is solving the current problem. By the way, thnx to comments here, learned more about encodings and file reading. Will return to this issue and try to find a better solution. Really appreciate your help)) – Salek Nov 10 '17 at 21:35
2

@Salek use `ADODB.Stream` with `Charset = “UTF-8”`. See https://stackoverflow.com/a/13855268/692942 – user692942 Nov 10 '17 at 22:53

VBS File Read: Change CR FF to CR LF

1 Answers1