
Support the advancement of AI by sponsoring Illustrious XL. Stay informed with our latest updates, version releases, and in-depth research insights from our AI team.

Discuss matters related to our favourite AI Art generation technology
How to use ComfyUI for beginners.
Click to view this content.
Illustrious XL v2.0—The best training base model in 1536 age
Support the advancement of AI by sponsoring Illustrious XL. Stay informed with our latest updates, version releases, and in-depth research insights from our AI team.
lllyasviel/FramePack: generate 60 seconds videos at 30fps using a 13B model with just 6GB VRAM
Lets make video diffusion practical! Contribute to lllyasviel/FramePack development by creating an account on GitHub.
UniAnimate-DiT: Human Image Animation with Large-Scale Video Diffusion Transformer
Click to view this content.
This report presents UniAnimate-DiT, an advanced project that leverages the cutting-edge and powerful capabilities of the open-source Wan2.1 model for consistent human image animation. Specifically, to preserve the robust generative capabilities of the original Wan2.1 model, we implement Low-Rank Adaptation (LoRA) technique to fine-tune a minimal set of parameters, significantly reducing training memory overhead. A lightweight pose encoder consisting of multiple stacked 3D convolutional layers is designed to encode motion information of driving poses. Furthermore, we adopt a simple concatenation operation to integrate the reference appearance into the model and incorporate the pose information of the reference image for enhanced pose alignment. Experimental results show that our approach achieves visually appearing and temporally consistent high-fidelity animations. Trained on 480p (832x480) videos, UniAnimate-DiT demonstrates strong generalization capabilities to seamless
HiDream - a new 17B parameters open-weights image generative foundation model | Civitai
Choosing Your HiDream AI Model: Versions, Formats, Hardware Needs & Download Links HiDream now provided in several AI model versions and format...
Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
zombieyang/sd-ppp: Simplify ComfyUI and Connect with Photoshop
Simplify ComfyUI and Connect with Photoshop. Contribute to zombieyang/sd-ppp development by creating an account on GitHub.
What's everybody using for AI art and video generation nowadays?
It's been a while since I've updated my Stable Diffusion kit, and the technology moves so fast that I should probably figure out what new tech is out there.
Is most everyone still using AUTOMATIC's interface? Any cool plugins people are playing with? Good models?
What's the latest in video generation? I've seen a lot of animated images that seem to retain frame-to-frame adherence very well. Kling 1.6 is out there, but it doesn't appear to be free or local.
deepbeepmeep/Wan2GP: Wan 2.1 for the GPU Poor
Wan 2.1 for the GPU Poor. Contribute to deepbeepmeep/Wan2GP development by creating an account on GitHub.
Liquid: Language Models are Scalable and Unified Multi-modal Generators
We present Liquid, an auto-regressive generation paradigm that seamlessly integrates visual comprehension and generation by tokenizing images into discrete codes and learning these code embeddings alongside text tokens within a shared feature space for both vision and language. Unlike previous multimodal large language model (MLLM), Liquid achieves this integration using a single large language model (LLM), eliminating the need for external pretrained visual embeddings such as CLIP. For the first time, Liquid uncovers a scaling law that performance drop unavoidably brought by the unified training of visual and language tasks diminishes as the model size increases. Furthermore, the unified token space enables visual generation and comprehension tasks to mutually enhance each other, effectively removing the typical interference seen in earlier models. We show that existing LLMs can serve as strong foundations for Liquid, saving 100x in training costs while outperforming Cham
SD.Next Release 2025-04-12
SD.Next: All-in-one for AI generative image. Contribute to vladmandic/sdnext development by creating an account on GitHub.
Last release was just over a week ago and here we are again with another update as a new high-end image model, HiDream-I1 jumped out and generated a lot of buzz!
There are quite a few other performance and quality-of-life improvements in this release and 40 commits, so please take a look at the full ChangeLog
Pusa: Thousands Timesteps Video Diffusion Model
Pusa introduces a paradigm shift in video diffusion modeling through frame-level noise control (thus it has thousands of timesteps, rather than one thousand of timesteps), departing from conventional approaches. This shift was first presented in our FVDM paper. Leveraging this architecture, Pusa seamlessly supports diverse video generation tasks (Text/Image/Video-to-Video) while maintaining exceptional motion fidelity and prompt adherence with our refined base model adaptations. Pusa-V0.5 represents an early preview based on Mochi1-Preview. We are open-sourcing this work to foster community collaboration, enhance methodologies, and expand capabilities.
Model: https://huggingface.co/RaphaelLiu/Pusa-V0.5
Code: https://github.com/Yaofang-Liu/Pusa-VidGen
Training Toolkit: https://github.com/Yaofang-Liu/Mochi-Full-Finetuner
Dataset: https://huggingface.co/datasets/RaphaelLiu/PusaV0.5_Trainin
willmiao/ComfyUI-Lora-Manager: LoRA Manager for ComfyUI - A powerful extension for organizing, previewing, and integrating LoRA models with metadata and workflow support.
LoRA Manager for ComfyUI - A powerful extension for organizing, previewing, and integrating LoRA models with metadata and workflow support. - willmiao/ComfyUI-Lora-Manager
MoonGoblinDev/Civicomfy: Civitai model downloader for ComfyUI
Civitai model downloader for ComfyUI. Contribute to MoonGoblinDev/Civicomfy development by creating an account on GitHub.
ostris/Flex.1-alpha-Redux - SigLIP2 vision encoder and Apache license 2.0
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Nunchaku v0.2.0: Multi-LoRA Support, Faster Inference, and 20-Series GPU Compatibility
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models - mit-han-lab/nunchaku
Announcement: https://github.com/mit-han-lab/nunchaku/discussions/236
IGORR - ADHD: An AI generated music video.
Click to view this content.
Some things are just meant for each other. Like Igorr and GenAI art.
EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer
Recent advancements in Unet-based diffusion models, such as ControlNet and IP-Adapter, have introduced effective spatial and subject control mechanisms. However, the DiT (Diffusion Transformer) architecture still struggles with efficient and flexible control. To tackle this issue, we propose EasyControl, a novel framework designed to unify condition-guided diffusion transformers with high efficiency and flexibility. Our framework is built on three key innovations. First, we introduce a lightweight Condition Injection LoRA Module. This module processes conditional signals in isolation, acting as a plug-and-play solution. It avoids modifying the base model weights, ensuring compatibility with customized models and enabling the flexible injection of diverse conditions. Notably, this module also supports harmonious and robust zero-shot multi-condition generalization, even when trained only on single-condition data. Second, we propose a Position-Aware Training Paradigm. This ap