0

For clarity:

This is NOT a duplicate of Getting std :: ifstream to handle LF, CR, and CRLF?

This IS an extension of C++ cutting off character(s) when read lines from file

I state this up front because when I posted the question at C++ cutting off character(s) when read lines from file it was tagged as a potential duplicate of Getting std :: ifstream to handle LF, CR, and CRLF?. I tried a simplified version (direct read instead of buffers to keep it simple) of the proposed solution at the other post it did not work for me and even though I edited my question and the code to demonstrate that, there has been no responses. Jonathon suggested I re-post as a separate question so here I am.

I have also tried numerous other solutions, ending up with the code below but although the code handles tabs and normal text as expected, it is still not handling the newline character differences as was expected so I need help.

I want to:

  • read in the contents of a txt file
  • run some validation checks on the content
  • output a report to another txt file

In this prototype code I am just reading in text from one file and outputting edited text to another file. After I get this working, I'll then worry about running validation tests, ...

I am compiling and testing on a Linux Mint Maya (Ubuntu 12.04 based) box and then cross-compiling with mingw32 to run on a Windows PC.

Everything works fine when I:

  • Compile and run on a linux box with a linux-created text file
  • Cross-compile on linux and run on Windows with a linux-created text file

However, when I:

  • Cross-compile on linux and run on Windows with a Windows-created text file

the result is not as expected; the first few characters are skipped.

I need the program to handle either Windows-created or linux-created text files.

The (silly content for now just as a test) input file I am using in all cases (one created on linux box; one created on Windows using Notepad) is :

A new beginning
just in case
the file was corrupted
and the darn program was working fine ...
at least it was on linux

When I read the file in and use the program (code shown below) the linux-created text file produces the proper output:

Line 1: A new beginning
Line 2: just in case
Line 3: the file was corrupted
Line 4: and the darn program was working fine ...
Line 5: at least it was on linux

When I use the Windows-created text file and run the program on a Windows PC, the output is:

Line 1: A new beginning
Line 2: t in case
Line 3: e file was corrupted
Line 4: nd the darn program was working fine ...
Line 5: at least it was on linux

As you can see, there are characters missing from lines 2,3,4 but not from 1,5:

  • 0 characters missing from the start of line 1
  • 3 characters missing from the start of line 2
  • 2 characters missing from the start of line 3
  • 1 characters missing from the start of line 4
  • 0 characters missing from the start of line 5

I expect this has something to do with the differences in handling of newline in linux and Windows text files but I have read the other postings on this and tried the solutions but it does not seem to be solving the issue. I am sure I am missing something very basic and apologize in advance if so, but I've been banging away at this for over a week and need help.

The code I am using is:

int main(int argc, char** argv)
{


    /*
     *Program to:
     *  1) read from a text file
     *  2) do some validation checks on the content of that text file
     *  3) output a report to another text file
     */

    std::string rc_input_file_name = "rc_input_file.txt";
    std::string rc_output_file_name = "rc_output_file.txt";

    char *RC_INPUT_FILE_NAME = new char[ rc_input_file_name.length() + 1 ];
    strcpy( RC_INPUT_FILE_NAME, rc_input_file_name.c_str() );
    char *RC_OUTPUT_FILE_NAME = new char[ rc_output_file_name.length() + 1 ];
    strcpy( RC_OUTPUT_FILE_NAME, rc_output_file_name.c_str() );

    std::ifstream rc_input_file_holder;
    rc_input_file_holder.open( RC_INPUT_FILE_NAME , std::ios::in );

    if ( ! rc_input_file_holder.is_open() )
    {
        std::cout << "Error - Could not open the input file" << std::endl;
        return EXIT_FAILURE;
    }
    else
    {
        std::ofstream rc_output_file_holder;
        rc_output_file_holder.open( RC_OUTPUT_FILE_NAME , std::ios::out | std::ios::trunc );

        if ( ! rc_output_file_holder.is_open() )
        {
            std::cout << "Error - Could not open or create the output file" << std::endl;
            return EXIT_FAILURE;
        }
       else
        {
            std::streampos char_num = 0;

            long int line_num = 0;
            long int starting_char_pos = 0;

            std::string file_line = "";

            while ( getline( rc_input_file_holder , file_line ) )
            {
                line_num = line_num + 1;
                long unsigned file_line_length = file_line.length();

                std::string string_to_find = "\r";
                std::string string_to_insert = "\n";
                long unsigned num_char_in_string_to_find = string_to_find.length();
                long unsigned character_position;
                while ( ( character_position = file_line.find( string_to_find ) ) != std::string::npos )
                {
                    if ( character_position == file_line_length - num_char_in_string_to_find )
                    {
                        // If the \r character is found at the end of the line, 
                        //   it is the old Mac style newline, 
                        //   so replace it with \n
                        file_line.replace( character_position , num_char_in_string_to_find , string_to_insert );
                        file_line_length = file_line.length();
                    }
                    else
                    {
                        // If the \r character is found but is not the last character in the line
                        //   it could be the second-last character meaning it is a Windows newline pair \r\n
                        //   or it could be somewhere in the middle of the line
                        //   so delete it
                        file_line.erase( character_position , num_char_in_string_to_find  );
                        file_line_length = file_line.length();
                    }
                }

                int field_display_width = 4;

                rc_output_file_holder << "Line " << line_num << ": " << file_line << std::endl;

                starting_char_pos = rc_input_file_holder.tellg();

            }

            rc_input_file_holder.close();
            rc_output_file_holder.close();
            delete [] RC_INPUT_FILE_NAME;
            RC_INPUT_FILE_NAME = 0;
            delete [] RC_OUTPUT_FILE_NAME;
            RC_OUTPUT_FILE_NAME = 0;
        }
    }
}

Any and all suggestions appreciated ...

Community
  • 1
  • 1
Ramblin
  • 171
  • 1
  • 11
  • I would expect the differences in newlines to be handled by the runtime library on each system, so it shouldn't matter where you compile. If you have a longer file, does the pattern of characters being deleted continue like that? – Barmar Apr 25 '15 at 12:55
  • 1
    How is this an extension of your previous question? It looks like it's the same exact problem. Am I missing some important difference? – Barmar Apr 25 '15 at 12:58
  • That isn't the output of that program. The program prints "starting at character position" on every line, but the output does not show that. What is the actual output of the real program that you ran? – Alan Stokes Apr 25 '15 at 12:58
  • 2
    Did you attempt some using basic debugging? Even inserting simple `printf`s after each string manipulation may be enough to zoom in on the problem. Also, did you inspect your output 'by eye', or did you use a hex editor to see what *really* ends up in your output? – Jongware Apr 25 '15 at 12:58
  • 1
    I tested your program with Visual Studio 2013 and your text. As Input I used your text as ANSI and UTF-8 (utf-8 set a magic number at the beginning). The code worked as expected. The output is like your linux output. – Martin Schlott Apr 25 '15 at 13:51
  • Can you provide the textfile? Not the text inside, the whole file. – Martin Schlott Apr 25 '15 at 13:52
  • @barmar I have tried multiple solutions and in this version use string manipulation, not character manipulation. And as stated above, it was suggested that I re-post with teh new information, seeing as the previous post was likely considered dead due to the duplicate post tag (which it is not). Also, yes, the patterns does maintain with 1st line missing no characters, last line missing no characters and the middle lines missing a descending number of characters. – Ramblin Apr 25 '15 at 13:57
  • @jongware Yes, I did do quite a bit of output debugging but did not include that here for simplicity. The ouput debugging results were posted in the previous post at http://stackoverflow.com/questions/29681515/c-cutting-off-characters-when-read-lines-from-file and the results are the same as I get with this new string manipulation attempt. – Ramblin Apr 25 '15 at 14:00
  • @AlanStokes My Bad. I edited dow the code to include only )what I thought were) the relevant lines, but instead of the cout line I have in there, I should have ... rc_output_file_holder << "Line " << line_num << ": " << file_line << std::endl; – Ramblin Apr 25 '15 at 14:03
  • @MartinSchlott How do I attach a test file to a comment? As for your compiler, I was wondering about a more modern compiler. I am on Linux Mint Maya (v13) which is Ubuntu Precise (LTS 12.04) based. When I apt-get install g++ I get v 4.6 of the C++ compiler and it works fine. When I apt-get install mingw32, I end up with 4.1.2 of the g++ cross-compiler. TTHe docs say installing v4.8 of mingw on Ubuntu 12.04 has issues, and I cannot find where and how to install v4.6 of mingw on ubuntu. Any suggestions? – Ramblin Apr 25 '15 at 14:08
  • @Ramblin Unfortunately I can only answer "I have no suggestions". I myself also develope cross-platform. But I use VS2013 as main tool. For Linux (Ubuntu) I use Eclipse with the C++ extension. This keeps me away from many hassles. At least I use C++11 and boost. – Martin Schlott Apr 25 '15 at 14:12
  • 2
    Don't add things in comments, edit the question to include the information you've been asked for. – Alan Stokes Apr 25 '15 at 14:25

2 Answers2

1

Well, Thank you to Martin Schlott who tried my program on his compiler and it worked with text files from either Windows or Linux sources.

This pointed me to the compiler differences and that was the key.

The cross-compiler installed by apt-get install mingw32 put an older compiler (v4.2.1) for the cross-compile but the apt-get install g++ put the linux compiler in at v 4.6.2.

So I found an old listing on sourceforge for a v4.6.3 of the cross-compiler Mingw with G++ v4.6.3
and installed it.

I had to include the path of the new install and I had to add two options in the compile command

  • -static-libgcc
  • -static-libstdc++

to prevent 2 "missing dll" error messages.

After that, the cross-compile worked cleanly and the newline differences were handled no problem.

I love technology: spending over a week thinking I was doing something wrong and the cross-compiler was out of date. Oh well, I learned a lot about C++ in the meantime and hopefully this helps someone else in the future.

Thanks again
R

Ramblin
  • 171
  • 1
  • 11
-1

If you want to fiddle with newlines manually you have to open the files in binary mode.

stefan
  • 3,681
  • 15
  • 25
  • Are you sure about that? There are a lot of posts about this where the author talks about a solution when you open a file in text mode. The http://stackoverflow.com/questions/6089231/getting-std-ifstream-to-handle-lf-cr-and-crlf post being one of them. If I must do it in binary I can adapt, but I want to output a text file with an error report using the contents of the file I open so it seems like the binary approach would create issues of its own. – Ramblin Apr 25 '15 at 20:51