Use MOVEM to really burn through setting memory!
I recommend you don't use CLR.L; if you look at the clock timings you'll find it seems to be quite inefficient. It is better to load a register with the value you want to set memory to and then MOVE.L ,(A0)+
However, for extreme rapidity, use MOVEM.L to set/clear large areas of memory. It is 2 to 3 times faster than using CLR or standard MOVE.L
Here's a subroutine example that sets 64 byte blocks, then sets any remaining long words, but can be customisable.
ORG $2000
MOVE.L #MEMSTART,A0 ; memory to clear
MOVE.L #ZEROS,A1 ; value to set memory to e.g. 0
MOVE.L #600,D7 ; number of bytes
BSR SETBLOCK
STOP #2700
SETBLOCK
; MOVEM doesn't support destination = (Ax)+,
; does support destination = -(Ax)
ADD.L D7,A0 ; so start at end
LSR.L #2,D7 ; divide by 4 for Long words.
MOVE.L D7,D6
LSR.L #4,D6 ; # of 16 longword blocks
BEQ.S NOBLOCK ; branch if no none
SUBQ.L #1,D6 ; one less so DBRA works
MOVEM.L (A1),D0-D4/A2-A4 ; 8 registers = 32 bytes
ZAPBLOCK MOVEM.L D0-D4/A2-A4,-(A0) ; 8 x 4 = 32 bytes
MOVEM.L D0-D4/A2-A4,-(A0) ; 8 x 4 again for 64 bytes
DBRA D6,ZAPBLOCK ; loop ends when D7=-1
NOBLOCK AND.W #$0F,D7 ; how many long words left
BEQ.S NONE
; do any remainder
SUBQ.W #1,D7 ; 1 less so DBRA works
MOVE.L (A1),D0 ; pattern in D0 if not there b4
ZAP MOVE.L D0,-(A0) ; set memory long word at a time
DBRA D7,ZAP
NONE
RTS
ZEROS DC.L 0,0,0,0,0,0,0,0 ; 8x4 = 32
ORG $2500
MEMSTART DS.B 600
This example uses D0-D4 and A2-A4 to get 8 registers to set 32 bytes at a time, repeated twice for 64 bytes. There's no reason why you can't add more MOVEM instructions to the ZAPBLOCK loop to write to 128, 256 or more bytes for each loop iteration, changing the LSR/ AND instructions accordingly.
Note that DBRA only operates on words, so this will only set 65k x the block size. This can be fixed, for example by using SUBQ and BGT, instead of DBRA.
For some reason I recall that the CLR instruction did a read as well as a write on some 68k's
Timing
Comparing 3 alternatives, assuming a standard 68000 with a 16 bit data bus...
Using CLR
LOOP:
CLR (A0)+ 12+8
DBRA D7,LOOP 10/14
30 cycles for every long word, 20 per long word with multiple clears.
Using MOVE.L
MOVEQ #0,D0 ; 4
LOOP:
MOVE.L D0,(A0)+ ; 12
DBRA D7,LOOP ; 10/14
22 cycles per long word, 12 per long word with multiple MOVE.L operations.
Using MOVEM.L
LOOP:
MOVEM.L D0-D4/A2-A4,-(A0) ; 8+8*8 = 72
MOVEM.L D0-D4/A2-A4,-(A0) ; 8+8*8 = 72
DBRA D6,LOOP ; 10/14
154 cycles/iteration but only around 9.5 cycles per long word. This is probably competitive with the performance of a hardware blitter.