0

I need to test git performance on a particular network setup. I don't want my test to measure hard disk write time or any other filesystem timings but network only.

Is there any trick to make git download to "null" or similar?

Note: For my desperation, the PC I have available runs Windows 10.

Thanks!

j4x
  • 3,595
  • 3
  • 33
  • 64

1 Answers1

1

If you have enough RAM, a ramdisk would work.

If space is an issue, possibly you could have another process deleting stuff from the ramdisk as files / directories appear, if the CPU load of scanning for them (or waiting with a notify) doesn't interfere. A fresh GIT clone might be mostly one big file so that doesn't help.


If you were using Linux instead of Windows, there are a lot of cool tricks you could use

Some of these might have Windows equivalents but I don't know what they are. Some of these might not be viable on Linux either.

On Linux, you could write a program that listens with inotify for files to be created, then uses ftruncate to discard their written blocks. Like an SSD TRIM operation, but on a file in a filesystem, punching holes in it (making it sparse) so the filesystem doesn't need to store those blocks.
You'd do this on a tmpfs filesystem, like /tmp often is; it's the go-to low-overhead Linux ramdisk-like filesystem that doesn't use a block-device at all, really just the kernel's in-memory VFS caching infrastructure.

(Windows of course makes it inconvenient to open a file while another process has it open for writing.)

Or on Linux, you could use a device-mapper virtual device with some real storage (a ramdisk) and a larger region backed by a "discard" or "zero" device that just throws away writes. (Like Simulate a faulty block device with read errors? but with a "zero" instead of "error" device, or linear with /dev/zero). Some filesystem types might be able to mkfs and mount on such a filesystem, although reading back as all zeros might crash / panic a lot of filesystems, if any metadata ever gets evicted from RAM and read back.

Or if you use a filesystem like XFS or BTRFS that can put metadata on one device, file contents on another device, you could put the metadata on a ramdisk and the file data on /dev/zero (discard writes, reads as zero). (The intended use-case is for metadata on fast RAID10 of fast disks or SSDs, data on a slower RAID5 or RAID6, especially of lower-RPM spinning rust.)

A FUSE (FS in User-Space) filesystem could probably also do this, maybe even selectively filtering files by name to have their data discarded or kept, so GIT wouldn't break if it tried to read back its data. Or keeping the first 16kiB of every file, so headers / file types can be read back. FUSE is extra filesystem overhead, though.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • You got my point, @Peter Cordes. If it was Linux or any Unix OS, there would be alternatives to choose from but in Rwindows, nothing useful is easy. The OS doesn't even has means of creating a RAM disk. We need to rely on thirdy part SW for it. – j4x May 09 '22 at 12:15
  • @j4x: I expect any decent ramdisk to be fast enough not to bottleneck gigabit ethernet, on a relatively modern PC (like DDR3-1600 or faster). According to google, there are plenty of Windows RAMdisk programs available, many of them free, so definitely an easy first attempt at this is to just to pick one. Otherwise you could try WSL2 and actually use `git` under Linux in tmpfs; that's still technically running under Windows. – Peter Cordes May 09 '22 at 12:19
  • @j4x: The assumption here is that memory bandwidth is *much* higher than network bandwidth, which is true for 1G ethernet with a modern system, or even 2.5G or 5Gbit ethernet. 10G or especially 100G ethernet would be a different matter. Also that executing actual NTFS kernel functions (or whatever FS you pick) won't be a meaningful bottleneck as long as it's just CPU time, not waiting for I/O to flush even to an NVMe SSD. Especially compared to a network bottleneck. Network buffering should absorb any bubbles most of the time, with only negligible increase in latency of sending a new RQ. – Peter Cordes May 09 '22 at 12:29
  • @j4x: If any of these assumptions are false, e.g. an old Core 2 (low memory bandwidth and slow CPU), or if you *are* using 10Gbit or faster ethernet with even a relatively modern PC, then this might not be a good approximation of pure-network time. In that case you need a faster test PC, and/or a better OS. **Or just some custom software to just make GIT requests to the server without actually storing the results to the filesystem.** GIT is open-source; might be possible to record a trace of the requests and call some git functions to replay it. – Peter Cordes May 09 '22 at 12:34
  • In the end I ran some tests with ImDisk. I am not so confident that Windows is not spoiling my results behind my back but, after all, I have numbers. Thanks @Peter Cordes – j4x May 10 '22 at 20:58