It sounds like you're adverse to allocating any additional space. Nonetheless, a hash table is still the right solution for speed. Honestly, most hash table implementations for simple data such as integers are so overweight from their one-solution-fits-all nature that I just roll my own depending on what I need. It can turn slow code into fast code when you need it for relatively little work.
Also, if your objection to hash tables is that they destroy order then perhaps you may want to use them a little differently to obtain expected O(n) while maintaining order:
Create a hash table that maps your array elements to two bits as a counting field from zero to three, and thirty bits as an index into the array of elements. Unless you've got over a billion values in your array, thirty bits is enough. That way your hash values are just a single 32-bit word.
Go through the elements in the array. If an element isn't in the table, insert the value into the hash table and set the count field to zero. It doesn't matter what the index portion is when you store it. If the element is in the table and the count field is zero, bump it up to 1 and store the element index with the new count field value. If the count field is already one or greater, set it to two and don't touch the stored index -- leave it as it is.
Go through the elements in the array again. Look up each element and if its index is the one stored and the associated count field is more than zero, print it out.
This should yield you what you want in the proper order with O(n) time. But, it uses hash tables which aren't desired for an unknown reason. I highly recommend you either accept a solution such as this one or explain the limitations so that you'll get a more accurately targeted solution.