-1

You should never use GET to change data on a server, that's obvious. The real question is:

Why does the Hypertext Transfer Protocol even allow changes to be made with a GET request? When the hell would anyone ever use a GET request to update something in the database?

  1. GET requests can be cached

  2. GET requests can remain in the browser history

  3. GET requests can be bookmarked

  4. GET requests can be distributed & shared

  5. GET requests can be hacked

Reference for the above 5 statements ^.

It is obvious that you should never use GET in this case:

1. Don't use GET to do that!

2. No, you still can't use GET to do that!

3. Do you GET it yet? (see what I did there)?

So again, I ask, why is it still even possible to change data using a GET? Why don't they just make it a read-only operation? Then you wouldn't even have to worry about using GET incorrectly or GET being used maliciously.

Yusha
  • 1,496
  • 2
  • 14
  • 30

3 Answers3

2

Just because someone says "don't do that" doesn't mean it can't be done.

Proper use/implementation of HTTP as described in the official specification won't result in data changes triggered by GET requests.

But the people that wrote the spec aren't the code police, so people can write their server code to do whatever they want.

Edit to add: After all of our prior discussion, I think I'll need to go into real depth to explain this in detail.

TL;DR; - research how network traffic actually works.

To really understand why it is possible to save data using a GET request, you need to know how things work.

We'll start at the lowest hardware layer and work our way up.

Layer 1 - your network card

The purpose of this card is simply to provide a pathway for traffic. That is all it does. It does not do any kind of filtering - that is the job of higher layers. The firmware of a network card doesn't know the first thing about HTTP, so it doesn't care whether the request is GET or not.

Your network card will NOT restrict an HTTP GET request from doing anything.

Layer two - the TCP/IP stack (This is somewhat generalized. There probably more layers to manage some of the communications that I'm not aware of since I'm not a network engineer. And, technically, the TCP/IP stack is two layers.)

TCP and IP started life as software written to the TCP and IP standards. As such, it was quite possible for different software vendors could write their own interpretations, and ignore elements of the standard if they wished (and I wouldn't be surprised if this actually happened as the standard matured). Eventually, the standard became so ubiquitous that it migrated into the firmware of the network cards themselves. At this point, TCP/IP can be considered to be a hardware implementation.

TCP stands for Transport Control Protocol. The job of TCP is to define how data is understood on the network. This is the layer that does the "thinking" to determine whether or not a piece of data is meant for your computer. TCP also knows nothing about HTTP, so it doesn't care about GET requests either.

IP stands for Internet Protocol. It sits atop TCP to help determine which piece of software is responsible for the content being sent via TCP. IP defines the concept of IP Address and port. IP is responsible for delivering the provided data package to the computer specified and the software package which is registered to process exchanges for the given port. This is where the concept of IP Address and Port are introduced and processed.

IP works by wrapping the actual data being sent into a data packet, adding directions (IP address and port, along with a few others) for delivery. The actual content of the package is ignored, as it is not the responsibility of the IP layer.

Note that the HTTP headers are a NOT a part of the IP addressing mechanism - they are part of the content being delivered.

Layer three - software

To process an IP data package, software programs will ask reserve a Port for themselves, telling the IP stack "Send any data coming in on this port directly to me".

ANY software package can reserve ANY port, but only software-to-port binding is a 1:1 deal. One program to one port and one port per program.

The HTTP standard is port 80 - but it is VERY common to use different ports (like 8080) to test websites before deploying them to port 80.

It is, however, entirely possible to use port 80 for a different purpose- you could use it for a Telnet server, an FTP server, a custom protocol - even a DNS server, if you wanted to. The TCP/IP stack DOES NOT CARE what you are doing with the port, it ONLY cares about delivering the data to the "front door".

Since it is possible (and not uncommon) to re-use ports for different things TCP/IP will not do any content filtering. Again, an HTTP GET request WILL NOT be filtered in the TCP/IP stack because it is the responsibility of the software package to process the data that is delivered.

... and now we get to what I've been trying to say all along. People decide what the software they write will do. There are MANY different implementations of HTTP servers (the two "standard" versions are Microsoft and Apache - but if you're not aware of it, look at what the Node.JS community can do by way of HTTP server implementations. There are probably thousands of different custom HTTP server implementations out there now.)

Knowing all of this, I will ask you HOW would it be POSSIBLE to restrict a GET request from adding/changing data? The IP layer looking at the data, based on the port and not allowing a body for GET requests? I can think of two "workarounds" right off the top of my head - add content to the URL or add content to a cookie.

So, to (finally) answer your question: While it is theoretically possible to build a system which would restrict the behavior of all software that behaves as an HTTP server so that it will block any GET request from modifying data - it is a practical impossibility.

theGleep
  • 1,179
  • 8
  • 14
  • You said: "*Just because someone says "don't do that" doesn't mean it can't be done.*" Unfortunately, that is trivial, and has no relevance to my question. Then you said: *"Proper use/implementation of HTTP as described in the official specification won't result in data changes triggered by GET requests."* Again, I am left wondering if you even read my original post. It should be blatantly obvious that "**proper use**" of the HTTP protocols as described in the original specifications won't result in data changes. That is not what I am asking. – Yusha Apr 25 '18 at 13:25
  • Finally you said: "*But the people that wrote the spec aren't the code police, so people can write their server code to do whatever they want.*". Again, I am really unsure as to what you thought I was asking or how this even makes sense. The people who *wrote the HTTP protocols and defined them for us to use* should not be liable for allowing certain requests to make changes that are detrimental to people's environments? What? Why would you create something to implement **x** but it can also implement **y** but do note that it **should never implement y**. – Yusha Apr 25 '18 at 13:28
  • Oh, but guess what, it **can**. What is the logic?! – Yusha Apr 25 '18 at 13:31
  • HTTP is simply a *standard* - standards don't control anything. That's what my point has been. "Why does HTTP allow data changes for GET requests?" - because it's nothing more than a convention that everybody using the Web makes use of. – theGleep Apr 25 '18 at 22:11
  • HTTP *is* indeed a **standard**, again, nobody is arguing that. The HTTP **GET** protocol being able to **change data** is the question I am proposing. When would you ever use that feature if it is explicitly stated **not to ever use this feature**. *Why* would they ever make it a feature? For example, can you imagine if Microsoft allowed you to store character literals in **int** data types, and then just said "*Hey, it's part of our rules not to ever store character literals in **int** data types, that coo?*" That wouldn't make any sense at all. (ex: **int number = "1";**). – Yusha Apr 26 '18 at 18:28
1

Taking it from theGleeps answer and comments, 'HTTP is simply a standard - standards don't control anything.'

HTTP is a set of guidelines, not an application or institution that has the power to control anything. When creating a web API you could use anything you like, you could make it such that the RETRIEVE verb will retrieve an object. You'd run into trouble straight away, because everybody else on the web does stick to the HTTP standard and uses the GET verb to retrieve items so there would be no client that could talk to your API. But there's no one to stop you from using RETRIEVE anyway.

Obviously, this is one of the things people run into when they insist on using GET to modify data on the server - as you pointed out in your question through the links you provide - bots assume any GET is safe and idempotent. Because the standard says so.

Your Microsoft example doesn't hold, because that's a company that created an application (or OS? API? What did you refer to?) that enforces that you can't store a character literal in an int datatype. For HTTP there probably is such an institution (I must admit that I'm not sure, does anyone own the HTTP standard? W3C, IETF?), but there's no overarching application/OS/API that could enforce the rules. The rules are enforced by the users and the way they interpret and implement them. And some rules (for example, 'the verb for retrieving an item is GET') are enforced more strongly than others (for example, 'GET is safe and idempotent').

Compare it to someone from the US visiting the Netherlands. The visitor is used to the AM/PM time format. No one will force her to use the 24h format, there's no time format police. But she'll quickly find out that a lot of Dutch get confused because they don't know if AM is morning or afternoon or something else.

So to summarise, why is it still possible to change data using GET, or to use the verb RETRIEVE instead of GET if you will? Because HTTP is a standard that has only power by the virtue of it's users.

I hope I made it a bit more clear with my answer (I was triggered by the question, very interesting).

Wil Koetsier
  • 190
  • 1
  • 13
0

Why does the Hypertext Transfer Protocol even allow changes to be made with a GET request?

It doesn't. The specification says:

4.2.1. Safe Methods

Request methods are considered "safe" if their defined semantics are essentially read-only; i.e., the client does not request, and does not expect, any state change on the origin server as a result of applying a safe method to a target resource. Likewise, reasonable use of a safe method is not expected to cause any harm, loss of property, or unusual burden on the origin server.

Of the request methods defined by this specification, the GET, HEAD, OPTIONS, and TRACE methods are defined to be safe.


So again, I ask, why is it still even possible to change data using a GET?

Because some people write HTTP server code which violates that part of the HTTP specification.

Since they are writing the code, it is their responsibility to follow the specification, but nobody can stop them if they (through choice or ignorance) do not.

Community
  • 1
  • 1
Quentin
  • 914,110
  • 126
  • 1,211
  • 1,335