You must fragment the packet payload on eight-byte boundaries. With a 128-byte MTU, the largest payload fragment you can have is 104 bytes, which is smaller than simply subtracting the IP header from the MTU (108 bytes). What the text is telling you that subtracting the packet header size (20 bytes) from the MTU, the next smaller fragment size divisible by eight is 104 bytes.
RFC 791, Internet Protocol has a complete description of how IP fragmentation works:
Fragmentation
Fragmentation of an internet datagram is necessary when it originates
in a local net that allows a large packet size and must traverse a
local net that limits packets to a smaller size to reach its
destination.
An internet datagram can be marked "don't fragment." Any internet
datagram so marked is not to be internet fragmented under any
circumstances. If internet datagram marked don't fragment cannot be
delivered to its destination without fragmenting it, it is to be
discarded instead.
Fragmentation, transmission and reassembly across a local network
which is invisible to the internet protocol module is called intranet
fragmentation and may be used [6].
The internet fragmentation and reassembly procedure needs to be able
to break a datagram into an almost arbitrary number of pieces that can
be later reassembled. The receiver of the fragments uses the
identification field to ensure that fragments of different datagrams
are not mixed. The fragment offset field tells the receiver the
position of a fragment in the original datagram. The fragment offset
and length determine the portion of the original datagram covered by
this fragment. The more-fragments flag indicates (by being reset) the
last fragment. These fields provide sufficient information to
reassemble datagrams.
The identification field is used to distinguish the fragments of one
datagram from those of another. The originating protocol module of an
internet datagram sets the identification field to a value that must
be unique for that source-destination pair and protocol for the time
the datagram will be active in the internet system. The originating
protocol module of a complete datagram sets the more-fragments flag to
zero and the fragment offset to zero.
To fragment a long internet datagram, an internet protocol module (for
example, in a gateway), creates two new internet datagrams and copies
the contents of the internet header fields from the long datagram into
both new internet headers. The data of the long datagram is divided
into two portions on a 8 octet (64 bit) boundary (the second portion
might not be an integral multiple of 8 octets, but the first must be).
Call the number of 8 octet blocks in the first portion NFB (for Number
of Fragment Blocks). The first portion of the data is placed in the
first new internet datagram, and the total length field is set to the
length of the first datagram. The more-fragments flag is set to one.
The second portion of the data is placed in the second new internet
datagram, and the total length field is set to the length of the
second datagram. The more-fragments flag carries the same value as
the long datagram. The fragment offset field of the second new
internet datagram is set to the value of that field in the long
datagram plus NFB.
This procedure can be generalized for an n-way split, rather than the
two-way split described.
To assemble the fragments of an internet datagram, an internet
protocol module (for example at a destination host) combines internet
datagrams that all have the same value for the four fields:
identification, source, destination, and protocol. The combination is
done by placing the data portion of each fragment in the relative
position indicated by the fragment offset in that fragment's internet
header. The first fragment will have the fragment offset zero, and
the last fragment will have the more-fragments flag reset to zero.