1

I made my simple txt scanner who writes the text into a file that matches my selection. The problem is writing to file when instead of the pen writes, for example, 洀漀. On picture you can see for example:

enter image description here

#include <iostream>
#include <fstream>
#include <string>

using namespace std;

int main()
{
int offset;
wstring DBSearchLine, ScanLine;

wifstream ScanFile, DBSearchFile;
wofstream ResultFile;
ScanFile.open("ScanFile.txt", ios_base::binary);
ResultFile.open("ResultFile.txt", ios::out, ios_base::binary);

if (ScanFile.is_open())
{
    while (!ScanFile.eof())
    {
        DBSearchFile.open("DBSearchFile.txt", ios_base::binary);
        if (!DBSearchFile.is_open())
        {
            cout << "Error open DBSearchFile.txt" << "\n";
            break;
        }

        getline(ScanFile, ScanLine);
        wcout << "Scan line is - " << ScanLine << "\n";

        while (!DBSearchFile.eof())
        {
            getline(DBSearchFile, DBSearchLine);
            wcout << "DBSearchLine is -" << DBSearchLine << "\n";
            if ((offset = ScanLine.find(DBSearchLine, 0)) != string::npos)
            {
                ResultFile << ScanLine << L"\n";
            }
        }
        DBSearchFile.close();
    }
    ScanFile.close();
}
else
{
    cout << "Error open ScanFile.txt" << "\n";
}
system("PAUSE");
return 0;
}
Blag
  • 5,818
  • 2
  • 22
  • 45
Marek
  • 147
  • 1
  • 13
  • Does it work if you use char instead of wide char? – zdf Dec 23 '16 at 22:42
  • @ZDF If you mean string istead of wstring that do not work getline because as inserted wstring to string. And there should be a difference in coding – Marek Dec 23 '16 at 22:53
  • `ResultFile.open("ResultFile.txt", ios::out, ios_base::binary)`- shouldn't it be `ios::out | ios_base::binary`? Also, can't reproduce - clang 3.9.0, Ubuntu 14.04.05 x86_64. – Schtolc Dec 23 '16 at 23:01
  • I mean, if you remove the w from everything: streams and strings. – zdf Dec 23 '16 at 23:02
  • Possibly, you wrote your document in UTF-8 or ASCII and then opened the result in Unicode (2 byte)... Make sure you don't mix encodings, and you should only use 1-byte chars. Sounds like an encoding issue. – user Dec 23 '16 at 23:20
  • @ZDF I can not remove it because you need to work with Unicode – Marek Dec 24 '16 at 10:35
  • @PatLaugh I have all the documents stored in Unicode – Marek Dec 24 '16 at 10:36
  • I found that the problem arises when there is a match in an empty row. I tried to solve this problem with `if (ScanLine == L"\r") continue` to skipped searching in the current row or instead "\r" use "\n" but this method did not work. – Marek Dec 24 '16 at 10:43
  • Just remove the w to see if it works like expected. If it works, put the w back and have a look at Windows' _setmode and stream's imbue. – zdf Dec 24 '16 at 10:57
  • @ZDF No this solution is not working – Marek Dec 24 '16 at 11:33
  • Another problem that I found is that unless there is not a bug with empty lines so it can not write to ResultFile in Unicode. On picture you can see for example [link](http://imgur.com/6RLfm9u) – Marek Dec 24 '16 at 11:33
  • "this solution is not working" I do not understand. I have ran your program. It works fine on my computer, except for messed end-of-lines. Post your ResultFile.txt. – zdf Dec 24 '16 at 13:11
  • I sent the RsultFile two posts above. [ResultFile.txt](http://imgur.com/6RLfm9u) – Marek Dec 25 '16 at 15:40
  • I need the actual file, not a picture. – zdf Dec 26 '16 at 18:23
  • https://uloz.to/!47DuBkJhQ87E/resultfile-rar – Marek Dec 29 '16 at 12:34

1 Answers1

1
#include <iostream>
#include <fstream>
#include <string>
#include <locale>
#include <codecvt>

using namespace std;

int main()
{
    /* via http://stackoverflow.com/a/5105192/4005233 
       changes the encoding of the console and all subsequently opened 
       files */
    std::locale::global(std::locale(""));

    wifstream ScanFile;
    ScanFile.open("ScanFile.txt", ios_base::binary);
    if (!ScanFile.is_open()) {
        cout << "Error open ScanFile.txt" << "\n";
        return 1;
    }

    wofstream ResultFile("ResultFile.txt", ios::out);

    while (!ScanFile.eof())
    {
        wifstream DBSearchFile;
        DBSearchFile.open("DBSearchFile.txt", ios_base::binary);
        if (!DBSearchFile.is_open())
        {
            cout << "Error open DBSearchFile.txt" << "\n";
            break;
        }

        wstring ScanLine;
        getline(ScanFile, ScanLine);
        wcout << "Scan line is - " << ScanLine << "\n";

        do
        {
            wstring DBSearchLine; 
            getline(DBSearchFile, DBSearchLine);
            // have all lines been read?
            if(!DBSearchLine.length())
                break;
            wcout << "DBSearchLine is -" << DBSearchLine << "\n";

            if (ScanLine.find(DBSearchLine, 0) != string::npos)
            {
                ResultFile << ScanLine << L"\n";
                break; // found a match, no need to search further
            }
        }while(1);
        DBSearchFile.close();
    }

    ScanFile.close();

    return 0;
}

This was tested using files with and without a BOM.

The innermost loop had to be changed to handle files with a newline character at the end; if I hadn't done that it would have match with an empty string which is always true.

(I've also changed a few other things according to my coding style, the important change is the one right at the top)

user45891
  • 780
  • 6
  • 17