0

I'm trying to convert CSV-file to TXT-file using simple C++-code like this:

std::ofstream txtFile(strFileName, std::ofstream::out | std::ofstream::app);
std::string strLine;
std::ifstream csvFile(strCSVDir);

while (std::getline(csvFile, strLine))
{
    std::string subString;
    std::stringstream s(strLine);

    while (std::getline(s, subString, ';'))
    {
        txtFile << subString << "\t";
    }

    txtFile << "\n";
}

txtFile.close();
csvFile.close();

It works fine, but only if the CSV-file doesn't contain any non-specified symbols, like arrow on this picture: bad symbols In this case my code can read only part of CSV-file until it meet this arrow symbol. How can I get around this situation?

Update: if I look at this CSV-file in byte-representation (for example in Far Hex-view), than I see code of arrow-symbol is "1A". The table of Unicode-characters points that it is Substitute symbol. How does it get in this CSV-file I don't know.

brightside90
  • 243
  • 3
  • 13
  • @DimChtz it didn't help :( – brightside90 Aug 19 '19 at 08:28
  • If there's a `0x1a` in your CSV file, there's a `0x1a` in your CSV file. "How to get around this situation" heavily depends on what the "correct" reaction to that condition would be -- which again depends on whtat that CSV file *is*. That field could be an *intentionally* hex-encoded numerical value. "How does it get in this CSV-file I don't know" is now the crux of your question -- we don't know either. Let's find out. Where did you get this CSV file, is there documentation about its contents, is there somebody you can ask? Don't just assume that it's safe to ignore that value. – DevSolar Aug 19 '19 at 09:05
  • @DevSolar no, I definetily can ignore that value, because it is located in that part of comma-separated row in csv-file which can be ignored. And it doesn't carry any semantic load. But it doesn't allow me to read other important data from this file. Whole file can contain over 1000000 rows of data and only 1 to 5 can have such symbols. – brightside90 Aug 19 '19 at 09:18
  • 1
    Ah, I see now. The issue here is that *Windows*, specifically, interprets the value `0x1a` in a file *opened in text mode* to mean "end of file". One solution here would be to open the file in binary mode. See duplicate. – DevSolar Aug 19 '19 at 09:29
  • 1
    @DevSolar yes, that is what I need! Thank you! – brightside90 Aug 19 '19 at 09:47

1 Answers1

1

It might be easier to just read the entire file - then replacing and finally saving.

Going from your snippet:

std::stringstream sstr;
sstr << csvFile.rdbuf();
std::string buffer = sstr.str();
boost::replace_all(buffer, ";", "");
txtFile << buffer;

Update: if you don't have it should be easy to replace with something else like a for loop (since it is just a single char replacement)

Update 2: The reason why reading might not read the entire file in this case is because it is being read as a text file and probably contains a terminating character somewhere due to the way it is being read - see https://en.cppreference.com/w/cpp/io/c#Binary_and_text_modes for explaination.

darune
  • 10,480
  • 2
  • 24
  • 62
  • if I use this line of code: std::stringstream sstr = csvFile.rdbuf(); than I get error "no sutable constructor exists to convert from ...". I'm using VS2012 C++ compiler. – brightside90 Aug 19 '19 at 08:39
  • btw, you forget ; in this line :) – brightside90 Aug 19 '19 at 08:40
  • @brightside90 sorry, should be fixed now – darune Aug 19 '19 at 08:43
  • this is good approach, but if I use it, then std::string buffer still contain only that part of file untill arrow – brightside90 Aug 19 '19 at 08:50
  • @brightside90 then it seems your issue is just with reading the file - you could try a more raw approach to reading it - fopen/fread or memory mapping the file – darune Aug 19 '19 at 09:11
  • @brightside90 see my update 2 – darune Aug 19 '19 at 09:18
  • Yes, I've read it. Thank you. Your guess is what I wrote about in my update to question. Now I'm thinking how to apply it to my code – brightside90 Aug 19 '19 at 09:25
  • @brightside90 your update sounds wrong - 0x1A shouldn't terminate the input at all (only zero or perhaps EOF will). However - try reading the file using a binary method instead. – darune Aug 19 '19 at 10:33
  • Yes, I agree with you, that I need to use binary method. And I've wrote about it in my last comment under my question, where user DevSolar have pointed me that my question is duplicate. – brightside90 Aug 19 '19 at 10:37