Regarding your first question, on the runtime of the non-optimal DFA. Purely theoretically your intuition that it should still run in O(n) is correct. However, imagine (as an example) the following pseudo-code for the Kleene-Star operator:
// given that the kleene-star operator starts at i=something
while string[i] == 'r':
accepting = true;
i++;
while string[i] == 'r':
accepting = true;
i++;
// here the testing of the input string can continue for i+1
As you can see, the first two while-loops are identical, and could be understood as a redundant state. However, "splitting" while loops will decrease (among other things) your branch-prediction accuracy and therefore the overall runtime (see Mysticial's brilliant explanation of branch prediction for more details here.
Many other, similar "practical" arguments can be made on why a non-optimal DFA will be slower; among them, as you mentioned, a higher memory usage (and in many cases, more memory means slower, for memory is - by comparison - a slower part of the computer); more "ifs", for each additional state requires input checking for its successors; possibly more loops (as in the example), which would make the algorithm slower not only on the basis of branch prediction, but simply because some programming languages are just very slow on loops.
Regarding your second question - here I am not sure on what you mean. After all, if you do the conversion properly you should derive a pretty optimal DFA in the first place.
EDIT:
In the discussion the idea came up that there can be several non-minimal DFAs constructed from one NFA that would have different efficiencies (in whatever measure chosen), not in the implementation, but in the structure of the DFA.
This is not possible, for there is only one optimal DFA. This is the outline of a proof for this:
- Assuming that our procedure for creating and minimizing a DFA is optimal.
- when applying the procedure, we will start by constructing a DFA first. In this step, we can create indefinitely many equivalent states. These states are all connected to the graph of the NFA in some way.
- In the next step we eliminate all non-reachable states. This is indifferent to perfomance, for an unreachable state would correspond to "dead code" - never to be executed.
- In the fourth step, we minimize the DFA by grouping equivalent states. This is where it becomes interesting - for the idea is that we can do this in different ways, resulting in different DFAs with different performance. However, the only "choice" we have is assigning a state to a different group.
So, for arguments sake, we assume we could do that.
But, by the idea behind the minimization algorithm, we can only group equivalent states. So if we have different choices of grouping a particular state, by transitivity of equivalence, not only would the state be equivalent to both groups, but the groups would be equivalent, too. So if we could group differently, the algorithm would not be optimal, for it would have grouped all states in the groups into one group in the first place.
Therefore, the assumption that there can be different minimizations has to be wrong.