Self Hosting 101: Deploying Stable Diffusion Models

Overview: Stable Diffusion offers a family of open-source multi-modal models that let you create high-quality, AI-generated artifacts using text prompts (supplemented by input types like images or input audio). While you can use hosted interfaces like DreamStudio or Hugging Face Spaces, many users prefer self-hosting for greater control, customization, and privacy. Note: For commercial reuse of model content, please visit our Licensing Page, Terms of Service Agreement, and Privacy Policy.

This guide covers three common deployment approaches:

Running Stable Diffusion locally
Deploying on a cloud virtual machine (e.g., AWS EC2, GCP, Azure)
Using hosted inference services (e.g., Replicate, RunPod)

1. Local Deployment

Why Run Locally?

Running Stable Diffusion locally gives you:

Full control over your environment, settings, and models
Offline image generation (no reliance on cloud services)
Freedom to experiment with custom workflows and fine-tuning
No recurring usage costs

System Requirements

GPU: NVIDIA GPU with at least 6 GB VRAM (RTX 3060 or higher recommended)
OS: Windows, macOS (with M-series chip), or Linux

Software: Python 3.10+, Git, and Conda (or venv)

Setup Steps

1. Create a Python environment

conda create -n sd-env python=3.10

conda activate sd-env

2. Choose and install a Stable Diffusion interface

There are multiple open-source interfaces for running Stable Diffusion.
ComfyUI is currently the most flexible and modular choice, ideal for both beginners and advanced users.

Interface	Description	Link
ComfyUI	A node-based, modular interface that lets you visually design custom workflows. Ideal for experimentation, automation, and advanced setups.	GitHub → comfyanonymous/ComfyUI
Automatic1111 WebUI	A widely-used, traditional web interface with strong community support and a vast plugin ecosystem.	GitHub → AUTOMATIC1111/stable-diffusion-webui
InvokeAI	A modern dashboard with a focus on workflow clarity and post-processing tools like upscaling and inpainting.	GitHub → invoke-ai/InvokeAI

3. Example: Installing ComfyUI

git clone https://github.com/comfyanonymous/ComfyUI.git

cd ComfyUI

pip install -r requirements.txt

4. Download model weights

Model checkpoint files (.ckpt or .safetensors) can be downloaded from:

Hugging Face
Civitai for community-trained models, LoRAs, and embeddings

Place model weights in:

ComfyUI/models/checkpoints/

5. Launch ComfyUI

python main.py

Then open your browser to:

http://127.0.0.1:8188

You’ll see a graph-based interface where you can build custom generation pipelines by connecting visual nodes.

Alternative: Installing Automatic1111 WebUI

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git

cd stable-diffusion-webui

pip install -r requirements.txt

python launch.py

Access the interface at http://127.0.0.1:7860

Performance Tips

Enable xformers for faster generation (supported by both ComfyUI and Automatic1111).
Use half-precision (fp16) models to reduce VRAM usage.

Keep repositories and dependencies updated to benefit from optimization improvements.

2. Cloud Deployment (e.g., AWS EC2)

Why Use the Cloud?

Cloud GPU instances provide:

Access to high-end hardware (A100, L4, RTX 4090, etc.)
Scalability for large projects or shared teams
No need to own expensive GPUs

Example: AWS EC2 Setup

Select Instance Type
- Recommended: g4dn.xlarge, g5.xlarge, or higher
- Use an AWS Deep Learning AMI (includes CUDA, PyTorch preinstalled)
Launch the Instance
- Open inbound ports 8188 (for ComfyUI) or 7860 (for Automatic1111)
- Allocate at least 50 GB of storage for models and outputs

Connect via SSH

ssh -i your-key.pem ubuntu@ec2-XX-XX-XX-XX.compute.amazonaws.com

Deploy ComfyUI

git clone https://github.com/comfyanonymous/ComfyUI.git

cd ComfyUI

pip install -r requirements.txt

python main.py --listen 0.0.0.0 --port 8188

Access the Interface
Visit:

http://<EC2-public-IP>:8188

Optional: Dockerized Deployment

docker build -t comfyui .

docker run -p 8188:8188 comfyui

3. Using Hosted Inference Platforms

Why Use Hosted Inference?

Hosted inference platforms handle infrastructure, GPU management, and scaling for you.
They’re ideal for users who want fast deployment without managing servers.

Replicate

Lets you run Stable Diffusion models through an API.
Ideal for integrating image generation into web apps or backends.

Example (Python):

import replicate

output = replicate.run(

"stability-ai/stable-diffusion:latest",

input={"prompt": "a futuristic city skyline at sunset"}

)

print(output)

RunPod

Provides on-demand GPU “Pods” for AI workloads.
Offers prebuilt templates for ComfyUI and Automatic1111.
Includes web access and optional public endpoints.

Steps:

Sign up at runpod.io
Launch a GPU pod (A100, 4090, etc.)
Choose the “ComfyUI + Stable Diffusion” template
Access your workspace via the provided web URL

Other Options

Modal — Serverless GPU compute with Python SDKs
Vast.ai — Marketplace for affordable GPU rentals

Paperspace Gradient — Cloud GPU notebooks for quick experimentation

Conclusion

Self-hosting Stable Diffusion gives you complete creative control and flexibility:

For experimentation and customization, run ComfyUI locally.
For scalability, deploy on cloud GPUs like EC2 or GCP.
For ease of use, rely on hosted inference providers like Replicate or RunPod.

Best Practices:

Secure your endpoints (use firewalls or VPNs)
Monitor GPU usage to control costs
Follow model license terms and ethical use policies

Quick Comparison Table

Deployment Type	Example Tools	Best For	Pros	Cons
Local	ComfyUI, Automatic1111, InvokeAI	Hobbyists, Artists, Developers	Offline control, full customization, no recurring cost	Requires GPU + manual setup
Cloud VM	AWS EC2, GCP, Azure	Teams, scalable workloads	Access to powerful GPUs, scalable, reproducible	Hourly cost, setup complexity
Hosted Service	Replicate, RunPod, Modal	Developers, integrations	Instant deployment, managed infrastructure	Limited customization, usage fees

Next Steps

Now that you’ve deployed Stable Diffusion, here are recommended next topics to explore:

Fine-Tuning & LoRA Training – Train custom aesthetic or style-specific models.
Workflow Automation in ComfyUI – Use nodes and batch processing for large-scale generation.
Optimizing for Speed – Explore GPU acceleration, quantization, and TensorRT.
Integrating via API – Use REST or WebSocket APIs to trigger generations from your own apps.

Model Management – Organize checkpoints, LoRAs, and embeddings efficiently across multiple environments.