Yes, multi-ported DRAM is an option, but much more expensive, probably more than twice as expensive per byte. (And lower capacity per die area, so available sizes will be smaller).
In practice real CPUs just have split L1d/L1i caches, and unified L2 cache and memory, assuming it's ultimately a von Neumann type of architecture.
We call this "modified Harvard" - the performance advantages of Harvard allowing parallel code-fetch and load/store, except for contention for access to the unified cache or memory. But it's rare to have lots of code cache misses at the same time as data misses, because if you're stalling on code fetch then you'll have bubbles in the pipeline anyway. (Out-of-order exec could hide that better than a single single-cycle design, of course!)
It needs extra sync / pipeline flushing when we want to run machine code that we recently generated / stored, e.g. a JIT compiler, but other than that it has all the advantages of unified memory and the CPU-pipeline advantages of the Harvard split. (You need extra synchronization anyway to run recently-stored code on an ISA that allows deeply pipelined and out-of-order exec implementations, and which fetch code far ahead into buffers in the pipeline to give more room to absorb bubbles).
The first pipelined CPUs had small caches or in the case of MIPS R2000 even off-chip caches with only the controllers on-chip. But yes, MIPS R2000 had split I and D cache. Because you don't want code-fetch to conflict with the MEM stage of load or store instructions; that would introduce a structural hazard that would interfere with running 1 instruction per cycle when you don't have cache misses.
In a single-cycle design I guess your cycle would normally be long enough to access memory twice because you aren't overlapping code-fetch and load/store, so you might not even need multi-ported memory?
L1 data caches are already multi-ported on modern high-performance CPUs, allowing them to commit a store from the store buffer in the same cycle as doing 1 or 2 loads on load execution units.
Having even more ports to also allow code-fetch from it would be even more expensive in terms of power, vs. two slightly smaller caches.