-1
#include <string>
using namespace std;

ifstream input = "foo.txt";
string val;
getline(inputFile, val); //using ifstream, no empty string
//val = "hello"

cout << val[0] // \357
cout << val[3] // h

So the way I understand it, I am storing the pointer in the first three bytes and not the character values? Is there a way to make it so I can access the character at [0]? I using std::string

Dharman
  • 30,962
  • 25
  • 85
  • 135
Nathan Takemori
  • 139
  • 1
  • 10

1 Answers1

1

(Posting my comment as an answer and expanding on it)

I am storing the pointer in the first three bytes and not the character values?

This statement is meaningless: the getline(istream&,string&) function outputs characters, not bytes into a string instance (if you were using wide-characters by default you'd get very different results with the same code), so reading your comment where you say you are "storing the pointer in the first three bytes and not the character values" is like saying you're storing a fruit-basket inside an orange.

Secondly, getline does not store pointers inside the string val you passed to it - and string itself does not store pointers broken-up inside elements of its internal character array (fun-fact: prior to C++11 string's characters did not need to be stored in contiguous memory!).

Anyway, the biggest hint that you aren't actually having any problems with pointers is the fact that pointers are not 3 bytes (24 bits) long on any modern platform (As 24-bit pointer would give you only 16MiB of address space)... so I thought to myself that given that your program is reading from a text file (foo.txt) and you notice the text appear at the 3rd character position into the file means there's 3 "invisible" characters at the start of the file - which would be unusual if it weren't for Microsoft's decision to prefix all text files saved in UTF-8 with a Byte Order Mark so programs can detect UTF-8 text files and so correctly interpret them as UTF-8 instead of ASCII or some other OEM encoding - because unlike comparing ASCII to UTF-16 (where every other byte is zero 90% of the time) it's almost impossible to easily compare ASCII files and UTF-8 files (especially UTF-8 files only using characters below 0x7F) because of ambiguous encoding. The problem is that files using UTF-8 actually should not have BOM prefixes because BOM is only relevant to UTF-16 and UTF-32 (as UTF-8 works regardless of endianness and byte-ordering).

The UTF-8 BOM bytes are 0xEF, 0xBB, 0xBF - the reason for their exact values is complicated - but I'm willing to bet that your string val's first 3 characters displayed on your computer were invisible or rendered using Mojibake characters because the computer is interpreting 0xEF as a visible, normal-human character, which it isn't - or it showed you the raw value and maybe your debugger had an option to interpret those bytes as a pointer address - but that's only your debugger doing that.

Dai
  • 141,631
  • 28
  • 261
  • 374