Let's have a look at what you're suggesting:
the HTTP client can just send more requests in one TCP connection without waiting for the response
So far, so good: I can send "GET /foo" and then immediately "GET /bar" on the same connection.
and receive the response for the corresponding request
So, the server replies "200 OK" with some HTML content, and ... wait, is that for "/foo" or "/bar"? The key word in your own description is "corresponding" - we need some way of saying "this response corresponds to request #1".
And then, halfway through sending the first response, the server finishes handling the other request, and is ready to send part of a different response; but if it jumps in with "200 OK", that's going to appear to be part of the response it's already sending. So we also need to be able to say "this is the start of a new response", and "this content is the continuation of response #2".
To do that, we need a new abstraction: a frame, with a header which can encode details like "the next 100 bytes are the start of response #2, which corresponds to request #1". (I'm not sure if that's exactly how an HTTP/2 frame works, but I think it's roughly the principle.)
We could do that and still keep the protocol human readable (which is what we really mean by "text-based" vs "binary") but there's going to be a lot of these frame headers, so the shorter we can make them, the better. So if we're interested in performance, we can give up on "human readable" as a requirement, and we end up with a binary framing protocol like HTTP/2.