I'm not sure that your scenario makes sense.
Your clarified scenario - successfully performing I/O on an arbitrary file handle, without even knowing whether it is asynchronous or not - is challenging, I think very unusual, and almost certainly not how the API was designed to be used, but perhaps (as you suggest) not entirely implausible.
(Although I don't think you can avoid requiring some cooperation between the caller and your code, because in the IOCP case, the caller has to be able to tell whose I/O a dequeued packet belongs to. You could do this by having the caller allocate the OVERLAPPED structures, as RbMm suggests, but it might be simpler to ask them for a completion key to use.)
I'm not certain offhand how Windows behaves if you provide a redundant event handle, e.g., when the I/O is actually synchronous or using IOCP. But I would guess that it isn't going to be a problem in practice, so provided you're not too worried about future-proofing, you're probably OK.
At any rate, it isn't all that difficult to deal with the particular issue your question asks about. Basically, you just need to prevent the structure from being released twice.
Before making each call, assign a unique completion key and add it to a linked list or other suitable global structure. (The structure must be capable of an atomic find-and-remove operation, or protected by a critical section or similar.)
If the call succeeds immediately, i.e., does not report that the I/O is pending, treat it exactly as if a queued packet were received from the IOCP queue. Typically, you would either use a common function that is called by both your IOCP thread and your I/O thread, or a call to PostQueuedCompletionStatus to manually insert a packet to the IOCP queue.
When a packet is received (or when the call succeeds immediately) first perform a find-and-remove for the completion key against the global structure. If the find fails, you know that you have already been notified of the success of the I/O, and don't need to do anything.
If the find-and-remove succeeds, process the I/O as appropriate and release the OVERLAPPED structure.
There are undoubtedly ways to optimize the same basic approach.
Addendum: if the caller is processing the IOCP packets, and providing you with a completion key to use, you won't be able to use a unique completion key on each request. In this scenario, you can use the pointer to the OVERLAPPED structure instead.
The reason (in the general case) for not using the pointer is that you might receive a packet containing a completion key from one I/O request along with an OVERLAPPED structure from a different one, because the OVERLAPPED structure might be both released and reassigned before a duplicate notification is processed. That doesn't matter in this case, because all of your requests will use the same completion key anyway.
Addendum^2: if you don't know anything about the handle, you'll also need to provide an event object for each OVERLAPPED structure, and wait on them in case notification of the I/O completion arrives that way. It's getting too late in the day for me to try to figure out the exact consequences of that, but it may mean that under some circumstances you get three notifications for the same I/O operation. You might be able to avoid that, but if not, this approach will still work.