The simplest "two pass" assembler can literally read the input text twice. Ideally, using the same exact code, it is doing the same work to parse and understand each line of assembly text on both passes, but on the first pass only gathering symbol definitions and counting locations as opposed to generating any machine code. The first pass builds nothing but a symbol table, which is dictionary of key value pairs where a name is the key and a location is the value. The second pass generates the machine code but not the symbol table.
A simple one pass assembler can literally read the input text only once, generating machine code as it can, but remember where forward references occur in the generated machine code (and what kind of reference), since those have to be fixed up aka patched when the referenced symbol's value is finally discovered.
Both these designs suffer if there are variable length instructions, whose optimal length is dependent upon positioning of labels and code. This happens in particular, with instruction sets that have both short and long branch instructions, where the short branch instructions can be used when the delta between the branch instruction's location and the branch target's location is within some size (e.g. fits in 8 bits). If a branch has to change from short to long or vice versa, that can affect code & label locations, which can in turn affect the sizes of other branches.
In these circumstances, I have found the best results (smallest generated code size) by assuming short branches when branch distance is unknown (in code that has a fair amount of branching, assuming larger branches by default will force many to actually require large branches when they otherwise wouldn't have). An assembler attempting optimal (minimal) code size, would have to be prepared to make adjustments broadly in the generated machine code, so parsing to some intermediate, once, and working that intermediate form is probably the best approach, since that will allow for quicker iteration when changes require cascading analysis and adjustment. There are lots of alternatives for an intermediate form, ranging from a node per line of assembly code, to a node per some larger chunk of machine code having at most one potentially varying-size instruction at its end.
Things are slightly more complicated when the assembler generates object files as part of separate (file) compilation, and intended for combination by a linker. In these circumstances, there will be external imported and exported symbols as well as internal symbols. Final fix / patching for externally defined symbols (i.e. used here but defined there) is done by the linker. The output of the assembler (an object file) then contains machine code (and data), fixup records (indicating references to unresolved symbols), imports (names of unresolved symbols), and exports (names & locations of exported symbols).
A simplistic two pass assembler without support for multiple file linking could make the simplifying assumption that all symbols will be resolved by the first pass, and therefore no fixup / patching information needs to be generated or stored. Adding support for multiple file linking eliminates this simplification, which is another reason I prefer the "one pass" assembler design as it already has an inherent notion of fixup records for forward references in the same file, and is more easily adapted to multiple file linking.
Some toolchains for RISC V family of processors (and some other processors) support linker "relaxation". This allows for changing the size of references (such as in code sequences for branches, usually function calls) based on the final location of symbols and code across multiple compilation units. This means that aforementioned notion of chunks of machine code are kept separate in the final compiler / assembler output for adjustment by the linker, and even internal branches within a single object file (in other systems normally fully resolved) remain potentially requiring adjustment (and maybe resizing as well).
In systems of linker relaxation, for cross compilation unit references, typically we would use the largest code sequence in the first place (e.g. by compiler or assembler) and let the relaxing linker shorten code sequences or operand sizes when possible. This is because without relaxation being performed (as might want for debugging), the larger code can still run, i.e. even if some call sites are using more code space than needed.
(There are numerous other features of link time operation & optimization, such as inserting a branch island or thunk on architectures where shorter branch sequences don't actually reach — and then there is the continuing subject of both static loading and dynamic loading (DLLs) in which some symbol resolution is further delayed as are associated fixups).