Hate Speech: HTTP

13 min readNov 14, 2020

I’m quite disappointed with overall let’s-wrap-it-in-http trend that is gaining momentum for at least last decade.

DNS over HTTPS, DASH / HLS, MS-IPHTTPS, WebDAV and tons of other protocols that make absolutely no sense except of transferring data over by widely known but redundant and (we will get to that) essentially broken medium.

DNS-over-HTTPS

This is something I will never understand. So, DNS is a protocol that was there for years, and in unencrypted form it can leak some private information about end user, so it’s a rational decision to encrypt it.

But why the hell would you need HTTPS for that? HTTPS is just an HTTP-over-TLS, all the encryption is done via TLS only, and TLS is not a thing that can’t be used without HTTP. What difference it makes wrapping queries in additional layer? If you want to get around firewalls, why not just expose it on port 443? I believe one can even switch between DNS-over-TLS and HTTPS on the same port using SNI. I hoped that specs are created by people a bit more rational than let’s-wrap-it-in-docker and let’s-rewrite-it-in-javascript and let’s-create-a-better-C folks.

HTTP Media Streaming (DASH, HLS and so on)

Key concept here is to break down stream into small chunks, let end client download manifest (playlist) so it could download those chunks.

Many small resources always were and are an antipattern in HTTP. The smaller the chunk, the more traffic is used to transfer HTTP headers. We’ve employed different domain names, bundling, lazy loading and other techniques for years just to escape this pattern, and here it is again. And there’s already native support for getting byte range from a HTTP resource, why inventing something new?

Also, there are normal adaptive streaming protocols being there for years. What’s the point to invent something for sole purpose serving it over HTTP?

But the real problem here is not the wrapping itself.

The real problem is that HTTP is essentially broken from the very start, and wrapping anything into it is just like TypeScript tries to fix essentially broken JavaScript, but still can’t get rid of typeof checks.

Nearly everything in HTTP is broken. We just got so used to it that we don’t pay attention to all the craziness accompanying every request. And then you suggest wrap yet another broken thing into this broken thing, providing even more edge-case shit to poor engineers that have to deal with your abomination. You’re making people lives worse, and while their lives are usually better than many others, it’s not a reason to do so.

So-called errors

6.5. Client Error 4xx
The 4xx (Client Error) class of status code indicates that the client
seems to have erred.

What a lie.

401 is not a client error

401 is lack of authorization for supplied request. But client not necessarily does know ahead of time that authorization is required. If I send a link to a classified document to my colleague without necessary permissions, it’s client would know nothing about it ahead of time. Moreover, HTTP has no means to say “hey, there’s Location header, BUT you can’t access it with your current permissions and have to resolve it first”.

The 401 (Unauthorized) status code indicates that the request has not
been applied because it lacks valid authentication credentials for
the target resource. The server generating a 401 response MUST send
a WWW-Authenticate header field (Section 4.1) containing at least one
challenge applicable to the target resource.
If the request included authentication credentials, then the 401
response indicates that authorization has been refused for those
credentials.

Authorization is a process of checking permissions for executing an operation. Authentication is a process of identification. Are you asking me for credentials or permissions? I don’t understand.

402 is not a client error and is not a transfer protocol error at all

402 tells client that payment is required to access resource.

First, it’s the same lack of authorization, just authorization is provided on the basis of payment. It’s no different from 401.

Second, what payment has to do with transfer protocol at all? It’s not Netflix protocol, where it could make sense, it’s transfer protocol, an abstraction tat has notion of “resource inaccessible” concept, but not for money. And why it’s so special? Why there is no separate status code for “18+ only”? No “only citizens of Vatican”? No “Putin disliked this” error?

And the best part is that it’s in RFC but pretends to not exist.

The 402 (Payment Required) status code is reserved for future use.

Everyone knows what happens with notion that’s declared but not defined. Everybody starts using it, but with their own vision of how it should be used. Good luck fixing that in 2030-s.

403 is not a client error

403 stands for Forbidden response. It’s not only de-facto used as “need authentication” response code (while “forbidden” would be OK for trying to accessing a chat you’ve banned from), it can’t be distinguished from 401 at all.

The server understood the request, but is refusing to fulfill it. Authorization will not help and the request SHOULD NOT be repeated.

Authorization will not help. Wat? Authorization is a process of checking whether identity has correct rights to access something, so it is exactly the thing that has to help, no? If admin grants me new rights, I still won’t be able to access the resource? If I forgot to authenticate and this is why I have zero permissions, login won’t help?

And again, it’s not a client error. It’s a failed negotiation, which is a completely normal outcome and not an error. After all, since HTTP is stateless, there is no way to tell whether client was ever notified about this limited access.

404 is not an error

If we can pretend that other codes have something related to errors, this is not an error in any sense. It’s the same as returning null in any programming language. It’s the same as “Authorization header is not present on request”, which not necessarily leads to blocking access, but just allows end application to treat user as anonymous. It’s just “Hey, may I borrow your X? — Sorry, I just don’t have it” kind of conversation. It’s not a failed request. It’s a return of nothing, just like 204.

The 404 (Not Found) status code indicates that the origin server did
not find a current representation for the target resource or is not
willing to disclose that one exists.

I’ll just quote the juicy part again:

origin server did not find a current representation for the target resource

What’s the client error in it if response depends on implementation-specific processing on remote server?

406 is not an error

The 406 (Not Acceptable) status code indicates that the target
resource does not have a current representation that would be
acceptable to the user agent, according to the proactive negotiation
header fields received in the request (Section 5.3), and the server
is unwilling to supply a default representation.

It’s nothing but a subset of 404 (I don’t have this resource, but only in requested format). There’s nothing wrong with it’s existence, but it’s failed negotiation and not an error.

409 is not an error

The 409 (Conflict) status code indicates that the request could not be completed due to a conflict with the current state of the target resource … For example, if versioning were being used and the representation being PUT included changes to a resource that conflict with those made by an earlier (third-party) request, the origin server might use a 409 response to indicate that it can’t complete the request.

So I’m editing internal wiki page. That took a while, and when I submit the result, wiki tells me that my changes would overwrite changes made by someone else in the meantime.

What is my error in that? It’s another negotiation outcome, not an error.

410 is not an error

410 Gone: the resource was here, but not anymore. This is even less an error than 404, since my client is likely to end here by following a link, i.e. being told that resource exists. Again, this is a subset of 404, which could use couple of headers to expose additional information about why resource is not present.

414 is a status introduced for software that simply doesn’t work

The 414 (URI Too Long) status code indicates that the server is refusing to service the request because the request-target is longer than the server is willing to interpret. This rare condition is only likely to occur when a client has improperly converted a POST request to a GET request with long query information, when the client has descended into a “black hole” of redirection (e.g., a redirected URI prefix that points to a suffix of itself)

Wat?

Client converting POST request into GET request must be responded with 400 or 405.

Black hole of redirection is in fact a server error that should be addressed on client side if server doesn’t know it’s looping clients. Getting a bit ahead, if HTTP was stateful or at least provided number of redirects via header, this could be tracked on server side as well, which would allow to notify server owner about such behavior.

But I’ll translate this in another way:

“We’re writing our servers in C and this will cause buffer overflow since we don’t know how to swap char array with bigger one”.

So it’s yet again status code dedicated for special purpose imposed from above, adjusting protocol for software need, not vice versa. What a clever move, now every client has to keep in mind a status code that was created to aid server limitations of a bunch of implementations.

or when the server is under attack by a client attempting to exploit potential security holes.

If you want to indicate a robber he’s a robber, why don’t you create separate status code for that? How the hell phrase “URI too long” translates in “possible vulnerability exploitation”?

418 is a stupid joke which is fun only if you’re in a middle grade

But yet we have to see that clown legacy on every resource. Is it that hard to keep tech things clean and not introduce any noise? Thanks at least for not adding it to RFC.

426 is not an upgrade but redirect

426 Upgrade tells client to switch to another protocol.

I agree that it could be a client error, e.g. accessing HTTPS endpoint with HTTP request. But HTTP-to-HTTPS transition isn’t done via HTTP Upgrade (which would be quite useful, in fact), it’s done via 3xx redirect. Websockets is another protocol that is not served over HTTP. Even Upgrade: HTTP/3 mentioned in spec implies switching transport protocol. So why it is called upgrade and not a redirect? It is literally sending client somewhere else.

511 is not a server error

511 stands for “network authentication required”, meaning that intermediate network node won’t allow client packets through until that client authenticates.

How the _ it is a server error? Why the _ it is not in 4xx section?

How the _ it is different from 401/403? Why not just extend auth negotiation with specifying origin of whom is requesting authorization?

Who the _ came with all those brilliant ideas?

Additional statuses coming out from everywhere just because somebody decided to reserve them

Wikipedia article is full of teapots and other codes some companies reserved for their own usage instead of reusing existing ones. This leads everyone into a classic blackhole: now client can’t use interleaving status codes to find out what’s the outcome, and those codes are leaking further and further into infrastructure. What a nice forward-looking move, innit?

Evolving this, there is another problem:

Extensions

There’s no valid way to extend HTTP.

One can create additional HTTP methods, but they will interleave.

One can create additional headers, but they will interleave. We already have a massive pile of standardized and X- headers we have to track, and there is no way to automatically tell the client “hey, those ones are from scheme X”. It would be enough just to add namespaces into spec, like HTTP/15.15 cloudflare.com/FLUSH /resource and HTTP/15.15 cloudflare.com/521 Server Is Down. One can do that manually, but this would just increase overall entropy and make clients even more cumbersome, because those namespaces would be defined by each vendor in their own vision, again. Welcome to the pile of neverending crap.

Length negotiation

This is my favorite one.

Client sends a request to server, or server sends response to client. Content-Length header is not set.

Where does request/response end? Is it TCP stall or has it already finished? There is no answer to that. There are just three ways to detect it:

Content-Length header.
Chunked transfer.
FIN of underlying TCP session that won’t happen because we’ve invented keep-alive to prevent that.

As a server, one has to put requirement over those, otherwise it’s impossible to determine client request end. But if it is a requirement, why the heck just don’t add it to the first request/response line? Everything is _ed up on so many levels that stairs in my 22-floor building haven’t that many steps.

But wait, there is yet again out-of-bound special status code, 411, to indicate client it has to provide content-length. And no means to tell the same to server, because client has no rights in this server supremacist protocol:

Client-side error reporting

While server has many ways to tell the client it’s doing something wrong, client has no ways to do the same. How can it indicate that content-length header is missing? That content negotiation didn’t work as expected? That server sent more bytes than requested? There is no way for that. So again it has to be implemented somewhere as link in headers, in DNS, in HTML, anywhere but not in the protocol itself.

Content negotiation

I’ve seen many _ed up things, but this is somewhat abhorrent

First of all, folks, do you know what’s a human-readable format is?

text/html;q=1.0, type application/postscript;q=0.8

Using comma for outer delimiting and semicolon for inner delimiting is not one. You’re already familiar with query string parameter passing, why the hell on earth do you invent other thing?

Second, the calculation of final quality is abhorrent. Not only that blows you mind, but it requires unbelievably fine tuning to get desired results, and that fine tuning may be done only over years of polishing automated systems. But this negotiation has only one rational reason to exist: to present normal, non-technical human being to get document in the most readable format for them. Do you expect every browser user to set their own set of fine tuned preferences?

Third, does the calculation fit the purpose? Why not implement this a set of thresholds:

Give me document in Russian unless it’s hardly readable
I’d prefer English then
But if that fails as well, switch to Russian again

Also I love using q instead of quality, unnecessary shortening that follows us everywhere from strtok to here (gosh, why, why do you have that urge to shorten everything? why can’t you go with just reasonable naming?). Congratulations, you’ve saved several bytes that theoretically could be compressed into tiniest chunk if the next one would be done right:

Compression

Compression is something that should have been introduced in HTTP or underlying protocol at the very start. Currently compression information is included in headers, which means that all headers must go uncompressed and may take as much as the whole compressed body. Same goes for URI, which should be compressed as well, but it isn’t. Instead of using something as simple as

HTTP/1.1 encoding=gzip&attribute=value&...<compressed intro line, headers & body>

and keeping only small line uncompressed HTTP just denies to compress half of the payload, so engineers end up mangling hosts to prevent cookies from main host to be included in CDN requests. I wanna die at this moment.

Cookies

Cookies is just another circle of hell. I won’t even start. You know it yourself.

Deprecations & warnings

There is no way to tell the client “hey, we’re deprecating this in 30 days”, “hey, there’s a new more robust API, please tell the runner they may consider an upgrade”. Only thing that can be done is shoving client off. And remember that client doesn’t mean “browser” where one can display something to a human being, usually somebody finds that API doesn’t work couple of months after it has stopped.

Keep-Alive

This is another _ed up concept leaking from derp software challenges and bad architecture into protocol. A header from enclosed protocol that controls enclosing transport protocol, which enclosed protocol should have no knowledge of, resulting in a spaghetti interleaving of different layers where nobody knows how to untangle this shit. Jokes on you, now we’re switching to UDP that has no notion of persistent connection.

Statelessness

To think of it, it seems obvious that HTTP/2 with it’s packaging of many resources in one response hints everyone that we’re in fact looking for a stateful protocol. Keep-alive header is a way to make protocol stateful. Everything just shouts at everyone:

THIS THING ISN’T STATELESS EVEN IF YOU MAKE ALL THIS EFFORT TO IGNORE IT

But still we’re on the brink of HTTP/3 and everyone pretends problem doesn’t exist. Yes, the only problem we have is TCP, of course.

Tons of protocols and other shit over HTTP that essentially do it’s job

HATEOAS, HAL, sitemaps.
Validation schemas for resources that accept payloads.
Detailed security settings (“i’m not letting you to access resource because you’re not in group X”).
Links to different contacts and meta-resources. How do I contact server owner, not host owner? Where should I submit a sensitive issue? A business proposal? Where can i find documentation for this resource?
Dry-run actions (what will you create should I send POST? Please preview the result so I’d inspect it before committing).
HTML encoding, in the end. It may be passed in header, but is not required, so good luck trying to parse head in ASCII to reparse in UTF-8 then.

There’s nothing, create your 15 standards exposed in JSON and be lucky if there’s something in <meta> HTML tag.

HSTS, HPKP and so on

Again, we’re using some weird syntax headers to control TLS, and HPKP is in fact an irreversible thing that breaks web with a click of fingers. While we had three OSI layers compressed into one, now the transport layer leaks into them as well and network stack turns into network pumpkin mash even though it’s not even close to neither Halloween or midnight. The whole thing is _ed.

Authorization vs. Authentication

Looks like people writing the spec never knew the difference. Web doesn’t too. The only thing it may do is to tell you where you can identify yourself, and nothing is ever told about authorization. At least provide me a link where I can request access to resource, please — but no, there’s no such thing in spec.

Damn, I hate HTTP so much

This is not the whole list. This is something I came up within an hour and half, just the bits and pieces that float up in my mind within that time.

HTTP is already broken. The less you put over it, the better, don’t put even more over the old mule. Let it breathe at least as hard as it does know.

And please replace it eventually with something normal.