8

I need to read all attributes from a tеxt file that looks like the following for one Stern (engl.: Star) object. I need to replace the string "leer" with "" but there can also be a valid string which shouldn't be replaced with "".

I.e for another Stern object there could be "leer" instead of "Sol" as well.

Problem:
The problem is it doesn't replace the "leer" with the "". And it seems like it saves "leer\\r" in the object instead of only "leer" but I tried to replace "leer\\r" as well and it still doesn`t work.

This is one Stern in the text file that should be read:

0
Sol
0.000005
0.000000
0.000000
leer
1
0

And this is my operator >> to read it:

istream& operator>>(istream& is, Stern& obj)
{
    string dummy;
    is >> obj.m_ID;
    getline(is, dummy);
    getline(is, obj.m_Bez);

    if (obj.m_Bez == "leer")
        obj.m_Bez = "";

    is >> obj.m_xKoord >> obj.m_yKoord >> obj.m_zKoord;
    getline(is,dummy);
    getline(is,obj.m_Sternbild);

    if (obj.m_Sternbild == "leer")
        obj.m_Sternbild = "";

    is >> obj.m_Index >> obj.m_PrimID;

    return is;
}

Stern.h:

#ifndef STERN_H
#define STERN_H
#include <string>
#include <iostream>

using namespace std;

class Stern
{
public:
    Stern();
    // 2.a)
    //Stern(int m_ID, string m_Bez, float m_xKoord, float m_yKoord, float m_zKoord, string m_Sternbild, int m_Index, int m_PrimID); 
    virtual ~Stern();

    void print() const; // 1.b)
    friend ostream& operator<<(ostream& os, const Stern& obj); // 1.b)i.
    friend istream& operator>>(istream& is, Stern& obj);


private:
    int m_ID;
    string m_Bez;
    float m_xKoord;
    float m_yKoord;
    float m_zKoord;
    string m_Sternbild;
    int m_Index;
    int m_PrimID;
};

#endif /* STERN_H */
Andre Kampling
  • 5,476
  • 2
  • 20
  • 47
Craig Harrison
  • 85
  • 1
  • 1
  • 6
  • 1
    And what is the problem with the code you show? – Some programmer dude Aug 30 '17 at 09:08
  • The problem is it doesn't replace the "leer" with the "" And it seems like it saves "leer\\r" in the object instead of only "leer" but I tried to replace "leer\\r" as well and it still doesn`t work. – Craig Harrison Aug 30 '17 at 09:09
  • 1
    If the input is the same as the one from your description then I guess it's because of the whitespace before the "leer" word? Don't forget you are using `getline(is, obj.m_Bez);` and that doesn't remove the whitespace. Try triming the string first then check for equality. – pmaxim98 Aug 30 '17 at 09:11
  • The exact values saved are: m_ID: 0 m_Bez: "Sol\\r" m_xKoord:4.99999987e-06 – Craig Harrison Aug 30 '17 at 09:13
  • And if you step through the code line by line in a debugger, what do you notice then? Are the values you read the correct ones, the ones you expect? – Some programmer dude Aug 30 '17 at 09:14
  • My crystal ball thinks that `m_Sternbild` is a `char*`. – molbdnilo Aug 30 '17 at 09:16
  • Also, can you tell us what `Stern` is? How is it defined? Preferably please try to create a [Minimal, Complete, and Verifiable Example](http://stackoverflow.com/help/mcve) to show us. – Some programmer dude Aug 30 '17 at 09:17
  • They are the correct values when I take a look into my Stern obj except the strings do have \\r behind them but thats because I can't figure out how to convert my dos to a unix file in my Virtual Machine ( Ubuntu(64bit) I just can`t replace the "leer" / "leer\\r" with my if() – Craig Harrison Aug 30 '17 at 09:18
  • How about comparing with `"leer\r"` as well? – Some programmer dude Aug 30 '17 at 09:19
  • @CraigHarrison Install `dos2unix` on your Ubuntu (if it's not already there) and convert with it. (And don't use the backtick character as an apostrophe.) – molbdnilo Aug 30 '17 at 09:19
  • Stern is basically a star with attributes which is later pushbacked into list l_stars (which is a priv attribute of my galaxy class) – Craig Harrison Aug 30 '17 at 09:20
  • @CraigHarrison Post (in the question) the actual definition of `Stern`, not a description of what it basically is. – molbdnilo Aug 30 '17 at 09:21
  • This should do then? if (obj.m_Sternbild == "leer\\r" || obj.m_Sternbild == "leer\r" || obj.m_Sternbild == "leer") obj.m_Sternbild = ""; @molbdnilo I`m fairly new to Ubuntu and only use it because our University offered a virtual machine with everything installed for this current semester. How do I install dos2unix? – Craig Harrison Aug 30 '17 at 09:23
  • @CraigHarrison better use `obj.m_Sternbild.erase(std::remove(obj.m_Sternbild.begin(), obj.m_Sternbild.end(), '\r'), obj.m_Sternbild.end());` then compare imo. – pmaxim98 Aug 30 '17 at 09:25
  • @CraigHarrison: `dos2unix` is already installed on Ubuntu. – Andre Kampling Aug 30 '17 at 09:26
  • I have posted the whole project in the question now. – Craig Harrison Aug 30 '17 at 09:28
  • @CraigHarrison also instead of '\r' try with '\n' too. – pmaxim98 Aug 30 '17 at 09:29
  • @AndreKampling So I open my terminal type "dos2unix /home/stud/stars-newline-leer.txt" It only tells me command not found – Craig Harrison Aug 30 '17 at 09:33
  • @molbdnilo is a string, sorry for letting you guess – Craig Harrison Aug 30 '17 at 09:37
  • @CraigHarrison: Sorry I though that Ubuntu would have been shipped with something such elementary: `sudo apt-get install tofrodos` will install it then. – Andre Kampling Aug 30 '17 at 09:37
  • @AndreKampling I installed it but dos2unix still seems to be non existent – Craig Harrison Aug 30 '17 at 09:40
  • @CraigHarrison: Then use `fromdos` as command it will do the same, if that doesn't work you didn't install it. – Andre Kampling Aug 30 '17 at 09:43
  • @AndreKampling Thanks a lot! That worked! Now it also correctly replaces the string m_Sternbild with my if() – Craig Harrison Aug 30 '17 at 09:47
  • @CraigHarrison: You should try to understand why it's working now. The [newline](https://en.wikipedia.org/wiki/Newline) characters in Windows/Unix are different. On Windows it's `CR+LF` and on Unix it's `LF`. The `tofrodos` tool can convert files between them. To check a file use the `file` command. – Andre Kampling Aug 30 '17 at 09:54
  • I did understand that my problem is the \\r on the end of the strings and that Windows and Unix files have different newlines but I couldnt figure out how to change them. And I still cant understand why my if (obj.m_Sternbild == "leer\\r") didn`t work because imo it should have worked? When I try to check my stars-newline-leer.txt with "file stars-newline-leer.txt" it only says stars-newline-leer.txt:ASCII text and after I convert it it says the same. – Craig Harrison Aug 30 '17 at 10:00
  • Hey folks---**can we get some answers going here**, instead of having an extended discussion in the comments? If you need to add information, [edit] it into your question. – Cody Gray - on strike Aug 30 '17 at 10:09
  • @CraigHarrison: If a question on StackOverflow is solved you don't write it to the tilte neither the text. You can accept an answer that is given see here: https://meta.stackexchange.com/q/5234/179419. – Andre Kampling Aug 30 '17 at 10:27
  • There must be a secret cabal of evildoers sneaking around the internets, teaching beginners to write `while (!file.eof())`. [Don't do that.](https://stackoverflow.com/questions/5605125/why-is-iostreameof-inside-a-loop-condition-considered-wrong) – molbdnilo Aug 30 '17 at 10:29

3 Answers3

2

The problem is that in Windows a newline is represented as CR + LF which is: "\r\n" and in Unix it is LF which is just "\n".
Your std::getline(...) command is reading till the "\n" in "leer\r\n" and discards the "\n", your resulting string will be:

"leer\r"

To solve this problem and convert files between Unix/Windows there are the 2 tools dos2unix and unix2dos. The Ubuntu equivalents are fromdos and todos, you will need fromdos to convert your Windows text file to a Unix text file.

To test wether a file uses CR + LF or LF you can do:

dos2unix < myfile.txt | cmp -s - myfile.txt

which was ansered here on the Unix & Linux StackExchange site.


And it seems like it saves "leer\\r" in the object instead of only "leer" but I tried to replace "leer\\r" as well and it still doesn`t work. I still cant understand why my if (obj.m_Sternbild == "leer\\r") didn`t work because imo it should have worked?

It should be:

if (obj.m_Sternbild == "leer\r")

without escaping the backslash \, because \r is read into the string.

Edit:

As @FreelanceConsultant in the comment below write: The above answer is not a general solution. Because a binary compiled either on Windows or Unix should work for text files for both platforms.

There are two solutions for that.

The obvious one is, to compare against two different versions of the input. With std::getline the Windows result is "leer\r" and Unix result is "leer".

if (obj.m_Sternbild == "leer\r" || obj.m_Sternbild == "leer")

Another solution would be to normalize the newline representation to one form and only check against that. It is a matter of taste and performance, because you would need to create new strings. See his answer as example.

Andre Kampling
  • 5,476
  • 2
  • 20
  • 47
  • This isn't really a solution. Some text files are created on Unix systems, some are created on Windows systems. Regardless of which platform a binary is compiled for, it should work with both text files created on Windows systems or Linux systems. – FreelanceConsultant May 23 '22 at 21:44
  • @FreelanceConsultant: Yes, you are right. It should work without any additional steps on the input file on Unix or Windows. To achieve that, it is needed to compare against both string versions (`if (obj.m_Sternbild == "leer\r" || obj.m_Sternbild == "leer")`) or to normalize it for the check in his own code without any program. – Andre Kampling Jun 19 '22 at 10:31
2

You could use this to remove any unwanted characters returned by std::getline.

// std::string s;
// std::getline(input, s);
s.erase(std::remove(s.begin(), s.end(), '\r' ), s.end());
s.erase(std::remove(s.begin(), s.end(), '\n' ), s.end());

This works on Linux systems where the input file is formatted with line endings CRLF. This is because, on Linux systems, std::getline is searching for the \n character, hence it returns an extra \r at the end of each line.

I would not expect this to work exactly as you might anticipate on other systems. For example, it might be the case that:

  • On OS X, getline probably searches for \r, meaning subsequent calls return a string which starts with \n. (The above will probably still work, because you still erase the \n.
  • On Windows, getline searches for \r\n. If a file was produced on OS X or Linux, I would assume getline fails to split the input into different lines, and just returns the entire input.
  • I'm not 100% sure about the above two points and haven't tested either case, because I don't happen to have an OS X system available, or a Windows system setup for development work.
FreelanceConsultant
  • 13,167
  • 27
  • 115
  • 225
1

And it seems like it saves "leer\r" in the object instead of only "leer"

You can either trim the string you get from getline or use getline in combination with a stringstream :

 std::string line;
 getline(is,line);
 std::stringstream ss(line);
 std::string trimmed_string;
 ss >> trimmed_string;

Now trimmed_string will contain only the desired string, no end line, trainling or leading whitespace or other stuff.

PS: this only works if the string you want to read does not contain whitespace itself. If thats the case you have to resort to a bit more involved massaging of the string you get from getline or choose some special character that you can replace with whitespaces after reading (eg read "Alpha_Centauri" and then replace "_" with " " to get "Alpha Centauri").

463035818_is_not_an_ai
  • 109,796
  • 11
  • 89
  • 185
  • Yes the problem is that there are m_Bez (basically the name of the star) which look like this: "96 G. Psc" And I wouldnt be allowed to change the txt file in any way. – Craig Harrison Aug 30 '17 at 09:35
  • @CraigHarrison then unfortunately my answer does not help. Maybe I will edit it later.... – 463035818_is_not_an_ai Aug 30 '17 at 09:41
  • Thanks for trying to help me. I really appreciate it! – Craig Harrison Aug 30 '17 at 09:43
  • 2
    Just note that `std::getline()` will read an entire line as-is up to the line break, whereas `ss >>` will skip leading whitespace and then read up to the first whitespace or end-of-string, whichever occurs first. So, `ss >>` is not just trimming if the line has any non-leading/trailing space in it before the line break. You would be chopping off actual data. Trimming involves scanning and removing only leading + trailing whitespace, not any whitespace in the middle. – Remy Lebeau Aug 30 '17 at 16:40
  • @RemyLebeau thats what my PS is about. I was hoping that a simple solution can help and didnt have time yet to improve the answer – 463035818_is_not_an_ai Aug 30 '17 at 16:42
  • @idclev463035818 You could still use your `stringstream` approach, just use `std::getline()` instead of `operator>>`, explicitly telling `getline()` to use `'\r'` as a delimiter. Or, you could forgo the `stringstream` and simply check if the last character in the `line` string is `'\r'` and if so then erase it. – Remy Lebeau Nov 07 '20 at 00:16