Let's think about how we'd do one step of simulating the NFA. We begin in some set of active states Qinit. For each state we're in, we look at all the outgoing transitions labeled with the current symbol, then gather them into a set Qmid. Next, we follow all possible ε-transitions from those states and keep repeating this process until there are no more ε-transitions to follow (that is, we find the ε-closure), giving us our new set of state Qnext. We then repeat this process once per character in the input.
So how much time does this take? Well, there are m characters in the input string, so the runtime is m times whatever work we do on one individual character. Each state can have at most n transitions leaving it on each character, so the time to iterate over all the characters in Qinit to find the set Qmid is O(n2): there are O(n) states in Qinit and we only need to scan O(n) transitions per state. From there, we have to keep following ε-transitions until we run out of transitions to follow. We can do that by doing a BFS or DFS starting simultaneously from each state in Qmin and only following ε-transitions. The NFA can have at most O(n2) ε-transitions total (one between each pair of states), so the BFS or DFS will run in time O(n2). Overall we're doing O(n2) work per character, so the total work done is O(mn2).