I know one of Dan North's intentions in devising BDD was to move the vocabulary away from the complexity of the test domain. However, in implementing an outside-in approach, it seems we still require some understanding of mocked behavior (or stubbed behavior, if you prefer). North suggests in this video that if I start with the outermost domain objects and work my way inward, I mock collaborators as I discover them and later replace them with the proper implementations. So in the end, I end up with a set of end-to-end tests.
Martin Fowler seemed to see it a little differently in this blog post when he defined two camps of TDD: "classical TDD" which uses real objects where possible and a mock when necessary, and "mockist TDD" which prefers mocks in most situations. He saw BDD as tending toward the latter. I.e, that at the end of developing a feature, the "mockist" approach would leave mocks in the actual tests (sorry to use that word in a BDD discussion).
In fairness, both materials are years old, and perhaps things became clearer as BDD evolved along the line between being applied at the unit level and the acceptance level.
But my question for the community is basically: when my story is complete, how much of an end-to-end test should my scenarios actually be? North explains that BDD requires abstractions. For example, when I'm testing login behavior, then my scenarios will detail what the login process means. However, when I'm doing some other scenario that requires but isn't about login, I don't want to have to do those steps over and over. I want an easy abstraction that simply says "Given I'm logged in," so I can go execute my other behavior.
So it seems my approach to abstraction will be in that I mock certain collaborators (or provide a "test double"), and some scenarios may use them more than others. For example, do I always mock out external resources, such as a DB or a mail server?
Perhaps I'm asking the wrong question. BDD is all about communication, shortening the feedback cycle, and discovering what you don't know. Maybe what-and-what-not-to-mock is an irrelevant question, so long as the behavior we're interested in actually works. I'm curious what others' approaches here are.