Would one have to know the machine architecture to write code?

Question

Let's say I'm programming in Java or Python or C++ for a simple problem, could be to build an TCP/UDP echo server or computation of factorial. Do I've to bother about the architecture details, i.e., if it is 32 or 64-bit?

IMHO, unless I'm programming something to do with fairly low-level stuff then I don't have to bother if its 32 or 64 bit. Where am I going wrong? Or am I correct???

Mate do you even understand the point of languages like Java ? It pretty much runs unmodified on all supported platforms with no program changes. — mP., Jun 26 '09 at 10:04
Comparing Java with C++ in terms of platform independence shows a lack of understanding the merits and failings of the mentioned languages/platforms. — mP., Jun 26 '09 at 10:05
the languages mentioned are just examples! I'm not comparing or criticizing! — g06lin, Jun 27 '09 at 17:59
mP, almost all the comments I have ever seen you make seem very forward/offensive, like calling people stupid. — GManNickG, Jun 28 '09 at 03:48

Aiden Bell · Accepted Answer · 2009-06-26T09:52:31.920

correct for most circumstances

The runtime/language/compiler will abstract those details unless you are dealing directly with word sizes or binary at a low level.

Even byteorder is abstracted by the NIC/Network stack in the kernel. It is translated for you. When programming sockets in C, you do sometimes have to deal with byte ordering for the network when sending data ... but that doesn't concern 32 or 64 bit differences.

When dealing with blobs of binary data, mapping them from one architecture to another (as an overlay to a C struct for example) can cause problems as others have mentioned, but this is why we develop architecture independent protocols based on characters and so on.

In-fact things like Java run in a virtual machine that abstracts the machine another step!

Knowing a bit about the instruction set of the architecture, and how the syntax is compiled to that can help you understand the platform and write cleaner, tighter code. I know I grimace at some old C code after studying compilers!

Been blissfully unaware of processor hardware for years and years. Haven't known or needed to know since I last wrote real-time controllers for military weapons. — S.Lott, Jun 25 '09 at 21:53

score 16 · Answer 2 · answered Jun 25 '09 at 20:37

Knowing how things work, be it how the virtual machine works, and how it works on your platform, or how certain C++ constructs are transformed into assembly will always make you a better programmer, because you will understand why things should be done the way they are.

You need to understand things like memory to know what cache-misses are and why those might affect your program. You should know how certain things are implemented, even though you might only use an interface or high-level way to get to it, knowing how it works will make sure you're doing it in the best way.

For packet work, you need to understand how data is stored on platforms and how sending that across the network to a different platform might change how the data is read (endian-ness).

Your compiler will make best use of the platform you're compiling on, so as long as you stick to standards and code well, you can ignore most things and assume the compiler will whip out what's best.

So in short, no. You don't need to know the low level stuff, but it never hurts to know.

The last bit is very well put: "it never hurts to know" – BCS Jun 25 '09 at 20:45 — BCS, Jun 25 '09 at 20:45

Daniel Earwicker · Answer 3 · 2009-06-27T19:00:56.210

8

The last time I looked at the Java language spec, it contained a ridiculous gotcha in the section on integer boxing.

Integer a = 100;
Integer b = 100;

System.out.println(a == b);

That is guaranteed to print true.

Integer a = 300;
Integer b = 300;

System.out.println(a == b);

That is not guaranteed to print true. It depends on the runtime. The spec left it completely open. It's because boxing an int between -128 and 127 returns "interned" objects (analogous to the way string literals are interned), but the implementer of the language runtime is encouraged to raise that limit if they wish.

I personally regard that as an insane decision, and I hope they've fixed it since (write once, run anywhere?)

edited Jun 27 '09 at 19:00

answered Jun 25 '09 at 20:53

Daniel Earwicker

114,894
38
205
284

interesting... Looking into "interned" objects details! On a 1.6.0_14 JVM on a Linux x86 32bit system, gives me true as long as a & b are < 128. – g06lin Jun 27 '09 at 18:30
Got the size of the range right, but it's signed (-128 to 127) - see http://java.sun.com/docs/books/jls/third_edition/html/conversions.html#5.1.7 – Daniel Earwicker Jun 27 '09 at 19:01

score 6 · Answer 4 · answered Jun 25 '09 at 20:46

6

You sometimes must bother.

You can be surprised when these low-level details suddenly jump out and bite you. For example, Java standardized double to be 64 bit. However, Linux JVM uses the "extended precision" mode, when the double is 80 bit as long as it's in the CPU register. This means that the following code may fail:

double x = fun1();
double y = x;

System.out.println(fun2(x));

assert( y == x );

Simply because y is forced out of the register into memory and truncated from 80 to 64 bits.

answered Jun 25 '09 at 20:46

quant_dev

6,181
1
34
57

1

Doesn't this mean that the Linux JVM isn't standard? IIRC, Java defines floating-point operations in great detail. – David Thornley Jun 25 '09 at 20:47
1

That example demonstrates something else - the implementer of the language runtime needs to understand the language definition properly. That is simply broken. No Java programmer should have to put up with that. – Daniel Earwicker Jun 25 '09 at 20:48
3

Yes, this is a bug. This was reported as a bug in GNU Java, but ignored: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16122 – quant_dev Jun 26 '09 at 06:42

score 3 · Answer 5 · answered Jun 25 '09 at 20:40

In Java and Python, architecture details are abstracted away so that it is in fact more or less impossible to write architecture-dependant code.

With C++, this is an entirely different matter - you can certainly write code that does not depend on architecture details, but you have be careful to avoid pitfalls, specifically concerning basic data types that are are architecture-dependant, such as int.

score 2 · Answer 6 · answered Jun 25 '09 at 20:46

As long as you do things correctly, you almost never need to know for most languages. On many, you never need to know, as the language behavior doesn't vary (Java, for example, specifies the runtime behavior precisely).

In C++ and C, doing things correctly includes not making assumptions about int. Don't put pointers in int, and when you're doing anything with memory sizes or addresses use size_t and ptrdiff_t. Don't count on the size of data types: int must be at least 16 bits, almost always is 32, and may be 64 on some architectures. Don't assume that floating-point arithmetic will be done in exactly the same way on different machines (the IEEE standards have some leeway in them).

Pretty much all OSes that support networking will give you some way to deal with possible endianness problems. Use them. Use language facilities like isalpha() to classify characters, rather than arithmetic operations on characters (which might be something weird like EBCDIC). (Of course, it's now more usual to use wchar_t as character type, and use Unicode internally.)

score 1 · Answer 7 · answered Jun 25 '09 at 20:39

If you are programming in Python or in Java, the interpreter and the virtual machine respectively abstract this layer of the architecture. You then need not to worry if it's running on a 32 or 64 bits architecture.

The same cannot be said for C++, in which you'll have to ask yourself sometimes if you are running on a 32 or 64 bits machine

score 0 · Answer 8 · answered Jun 25 '09 at 20:34

0

You will need to care about "endian-ness" only if you send and receive raw C structs over the wire like

ret = send(socket, &myStruct, sizeof(myStruct));

However this is not a recommended practice.

It's recommended that you define a protocol between the parties such it doesn't matter the parties' machine architectures.

answered Jun 25 '09 at 20:34

Reginaldo

897
5
12

No, it's way worse than that. You need to know endianness if reading or writing any binary data to files, as well, and it doesn't matter if it's in structures or not. – Larry Gritz Jun 25 '09 at 23:39
If you are writing to binary files, you have already agreed on a protocol that both parties will use to communicate with each other. The protocol is the file format. – Reginaldo Jun 26 '09 at 16:12

score 0 · Answer 9 · answered Jun 25 '09 at 20:35

0

In C++, you have to be very careful if you want to write code that works indifferently on 32 or 64 bits. Many people wrongly assume that int can store a pointer, for example.

answered Jun 25 '09 at 20:35

Bastien Léonard

60,478
20
78
95

I've never had problems with that. I have had problems over the years with people who thought that int was the same as size_t. – David Thornley Jun 25 '09 at 20:37
1

Isn't presuming width of a type (when it is mismatched like that) just bad programming practice? :P – Aiden Bell Jun 25 '09 at 20:38
1

Oh, yes, it is. However, it's common programming practice. I first noticed it when I was teaching from a textbook that suggested trying out "printf("Size of int is %d\n", sizeof(int));" and a student got zero printed out - on a big-endian system where int was 16 bits and size_t 32. – David Thornley Jun 25 '09 at 20:49
1

Pah! I can write C++ code that works indifferently on any platform! Usually do, in fact.... – Jun 25 '09 at 20:49
@Neil - I once had to explain to a manager why I'd made the window title bar and close button look different on Windows and Solaris. – Daniel Earwicker Jun 25 '09 at 20:54
@Earwicker I think my small joke may have gone under (or possibly over) your head. Sigh. – Jun 25 '09 at 21:01
@Neil: it isn't as easy as many people think. For example, you can't assume that letters' codes are consecutive. And you often have to use non-standard libraries for basic stuff, e.g. directory handling. – Bastien Léonard Jun 25 '09 at 21:03
@Neil: so it was a joke? Sorry, my understanding of English is very limited. – Bastien Léonard Jun 25 '09 at 21:04
@Bastien In English, "indifferent" is sometimes a synonym for "bad" or ""incompetent" - it was a joke, obviously only undferstood by the teller :-( – Jun 25 '09 at 21:08
Don't worry, Neil, I got it... it was... different. – Daniel Earwicker Jun 25 '09 at 21:46

score 0 · Answer 10 · answered Jun 25 '09 at 21:12

With java and .net you don't really have to bother with it unless you are doing very low level stuff like twiddling bits. If you are using c, c++, fortran you might get by but I would actually recommend using things like "stdint.h" where you use definitive declares like uint64_t and uint32_t so as to be explicit. Also, you will need to build with particularly libraries depending on how you are linking, for example a 64 bit system might use gcc in a default 64 bit compile mode.

score 0 · Answer 11 · answered Jun 25 '09 at 23:43

A 32 bit machine will allow you to have a maximum of 4 GB of addressable virtual memory. (In practice, it's even less than that, usually 2 GB or 3 GB depending on the OS and various linker options.) On a 64 bit machine, you can have a HUGE virtual address space (in any practical sense, limited only by disk) and a pretty damn big RAM.

So if you are expecting 6GB data sets for some computation (let's say something that needs incoherent access and can't just be streamed a bit at a time), on a 64 bit architecture you could just read it into RAM and do your stuff, whereas on a 32 bit architecture you need a fundamentally different way to approach it, since you simply do not have the option of keeping the entire data set resident.

Would one have to know the machine architecture to write code?

11 Answers11

Linked