-2

I'm trying to get better understanding of endianness when someone read a file.

The machine i'm using is little endian.

The code down below is supposed to read any file type.

But what if the file we are reading is in UTF-16BE encoding, should we after reading the whole file change the endianness?

I'm asking this becouse i'm planing on editing the content of the file and output it in console.

In case we should change the endianness, how can that be done?

Right now i'm reading the files like this:

std::ifstream file("/RANDOME/PATH/file.html", std::ios::in | std::ios::binary);

std::string result;

file.seekg(0, std::ios::end);   
result.reserve(t.tellg());
file.seekg(0, std::ios::beg);


result.assign((std::istreambuf_iterator<char>(file)),
            std::istreambuf_iterator<char>());


file.close();

I have no idea how to change the endianness from Big to little when reading a file. Can someone kindly show me step by step how that is done correctly? i'm only trying to learn. I know the file is using UTF-16BE encoding that is not a guess.

John
  • 25
  • 5
  • Why would you need to care about endianess with a text file input? If you have UTF16 encoding you might be fine with just using `std::wstring`. – πάντα ῥεῖ Dec 25 '18 at 11:16
  • Becouse i'm running on a little endian so if the file i'm trying to read is UTF-16BE encoding, it means it was created by big endian machine so i will not be able to read it. – John Dec 25 '18 at 11:18
  • [Reading UTF-16 file in c++](https://stackoverflow.com/questions/50696864/reading-utf-16-file-in-c/50714844#50714844) may contain some helpful hints. – Ted Lyngmo Dec 25 '18 at 11:25

1 Answers1

0

Here is some code that does what you want. Note that this code reads the input file a line at a time rather than reading it all in one fell swoop.

#include <string>
#include <fstream>

void swap_bytes (char16_t *s)
{
    while (*s)
    {
        unsigned char *uc = (unsigned char *) s;
        unsigned char swap = *uc;
        *uc = uc [1];
        uc [1] = swap;
        ++s;
    }
}

int main ()
{
    std::basic_ifstream <char16_t> file ("/RANDOME/PATH/file.html", std::ios::in);
    if (!file)
        return 1;

    std::basic_string <char16_t> line;

    while (std::getline (file, line))
    {
        swap_bytes (line.data ());
        // ...
    }

    file.close();
}

If anything is unclear please say so in the comments.

Live demo

Paul Sanders
  • 24,133
  • 4
  • 26
  • 48
  • Can we not do like this: int16_t one = bytes[i++] & 0xFF; int16_t two = bytes[i++] & 0xFF; std::cout << (one << 8 | two )<< std::endl; Does this make big to little? – John Dec 31 '18 at 17:00
  • It makes sense to me to read and manipulate the data as `char16_t` since that's what it is. – Paul Sanders Dec 31 '18 at 17:25
  • I get you. Can you please help me to adresse the second question ? :) – John Dec 31 '18 at 17:28
  • If we are manipulating the data as an array of `char16_t`s, you could do `char_16t swapped = (original >> 8 ) | (original << 8);` – Paul Sanders Dec 31 '18 at 18:15