Label encoding imposes artificial order: if you label-encode your pet target as 'Dog':0, 'Cat':1, 'Turtle':2, 'Golden Fish':3, then you get the awkward situation where 'Dog' < 'Cat' and 'Turtle is the average of 'Cat' + 'Golden Fish'.
In the case of predictor features (not the target), this is a problem since your Random Forest can be learning something like "if it less than 'Turtle', then...".
Also, you may have categories in the testing set (or even worse, new data during deployment) that were not present in the training, and the transformer doesn't know what to do, so it throws an error. This may be the case or not depending on the particular problem and particular feature you are encoding, obviously not for the target variable.
When hot encoding, if a category absent in the training is present in a prediction, it just get encoded as 0 in each of the encoded features (new columns representing each category), so you don't get an error. Your model still has the other features to make a reasonable guess.
As a general rule, you want to use label encoding for target variables and OHE for predictor features. Note that in general you don't care about artificial order in the target, since the prediction is usually categorical also (A forest will choose a number, not a range of numbers; a network will have one activation unit per category...)
I don't think optimization should be part of the discussion here since they are used for different scenarios demanding different outputs: surely it's more efficient to use the OHE transformer than trying to hack it by performing label encoding and then some pandas trickery to create the same result as with one hot encoding.
Here there are useful comments about the different scenarios (type of model, type of data) and some issues related to efficiency.
Here there's an example on why label encoding is a bad practice for input features.
And let's not forget that the goal of the model is to make predictions, so at the end what's important is not just the output of <transformer>.fit_transform
, but also the fitted transformer itself that's going to be applied to the new observations. OHE will deal with new cases differently than label-encoder (e.g. when the value of the feature in the observation was not present in the training set). That's in my opinion enough reason to have different methods, even when they act in a way similar enough so, for some inputs, you may be able to force them to give similar outputs.