9

I installed the GUI version of Stable Diffusion here. With it I was able to make 512 by 512 pixel images using my GeForce RTX 3070 GPU with 8 GB of memory:

GUI screenshot

However when I try to do the same thing with the command line interface, I run out of memory:

Input:
>> C:\SD\stable-diffusion-main>python scripts/txt2img.py --prompt "a close-up portrait of a cat by pablo picasso, vivid, abstract art, colorful, vibrant" --plms --n_iter 3 --n_samples 1 --H 512 --W 512

Error:

RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 8.00 GiB total capacity; 6.13 GiB already allocated; 0 bytes free; 6.73 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

If I reduce the size of the image to 256 X 256, it gives a result, but obviously much lower quality.

So part 1 of my question is why do I run out of memory at 6.13 GiB when I have 8 GiB on the card, and part 2 is what does the GUI do differently to allow 512 by 512 output? Is there a setting I can change to reduce the load on the GPU?

Thanks a lot, Alex

Alex S
  • 4,726
  • 7
  • 39
  • 67

2 Answers2

7

This might not be the only answer, but I solved it by using the optimized version here. If you already have the standard version installed, just copy the "OptimizedSD" folder into your existing folders, and then run the optimized txt2img script instead of the original:

>> python optimizedSD/optimized_txt2img.py --prompt "a close-up portrait of a cat by pablo picasso, vivid, abstract art, colorful, vibrant" --H 512 --W 512 --seed 27 --n_iter 2 --n_samples 10 --ddim_steps 50

It's quite slow on my computer, but produces 512 X 512 images!

Thanks, Alex

Alex S
  • 4,726
  • 7
  • 39
  • 67
  • Nice work (+1). Additionally, you can look into a superres network to bump up the resolution (e.g. [SRGAN](https://deepai.org/machine-learning-model/torch-srgan), but surely there are newer/better options) – George Profenza Sep 08 '22 at 00:36
  • 2
    Thanks, I'll check it out, I was using Video2X, which works pretty well: https://github.com/k4yt3x/video2x – Alex S Sep 08 '22 at 00:40
  • Awesome! worked like a charm with the additional tags I was missing. Thx – inkblot Sep 11 '22 at 01:50
0

i get the same problem using the CPU, the process just seems to be killed when its consuming too much memory, so it may or may not be the number of workers as mentioned by @inkblot, but it seems to be not just a GPU or cuda problem either.

For me it also gets killed when i tried the optimisedSD script mentioned by @AlexS.

So im guessing both the scripts are probably not guarding for exorbitant memory consumption (where the machine runs out of total memory) and is just assuming it has enough, as most newer machines using CUDA on a GPU will.

My use case is i want it to execute to completion even if it takes much longer on my CPU as my machine cant use CUDA. so its possible that the processes memory usage should be capped and might need to be handled more sparingly on CPUs.

jsky
  • 2,225
  • 5
  • 38
  • 54
  • note that the optimised script says of txttoimg: `can generate 512x512 images from a prompt using under 2.4GB GPU VRAM in under 24 seconds per image on an RTX 2060.` so using GPU on a newer machine its running up to 2.4GB ram. on an older CPU it could easily blow up to double the ram. if the machine only has 8gb easy to see it can approach its limit. – jsky Sep 28 '22 at 04:10
  • running tensors on cpu was also found to blow up the memory consumption https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/689 – jsky Sep 28 '22 at 04:19