This text will display you the best way to set up and use Home windows-based tool that may educate Hunyuan video LoRA fashions, permitting the person to generate customized personalities within the Hunyuan Video basis mannequin:
Click on to play. Examples from the hot explosion of superstar Hunyuan LoRAs from the civit.ai neighborhood.
In this day and age the 2 hottest techniques of producing Hunyuan LoRA fashions in the neighborhood are:
1) The diffusion-pipe-ui Docker-based framework, which is determined by Home windows Subsystem for Linux (WSL) to take care of one of the processes.
2) Musubi Tuner, a brand new addition to the preferred Kohya ss diffusion coaching structure. Musubi Tuner does no longer require Docker and does no longer rely on WSL or different Linux-based proxies – however it may be tricky to get working on Home windows.
Due to this fact this run-through will center of attention on Musubi Tuner, and on offering an absolutely native resolution for Hunyuan LoRA coaching and era, with out using API-driven internet sites or business GPU-renting processes corresponding to Runpod.
Click on to play. Samples from LoRA coaching on Musubi Tuner for this text. All permissions granted by means of the individual depicted, for the needs of illustrating this text.
REQUIREMENTS
The set up would require at minimal a Home windows 10 PC with a 30+/40+ sequence NVIDIA card that has a minimum of 12GB of VRAM (although 16GB is beneficial). The set up used for this text was once examined on a gadget with 64GB of formula RAM and a NVIDIA 3090 graphics playing cards with 24GB of VRAM. It was once examined on a devoted test-bed formula the usage of a contemporary set up of Home windows 10 Skilled, on a partition with 600+GB of spare disk house.
WARNING
Putting in Musubi Tuner and its necessities additionally involves the set up of developer-focused tool and applications without delay onto the primary Home windows set up of a PC. Taking the set up of ComfyUI into consideration, for the tip phases, this undertaking would require round 400-500 gigabytes of disk house. Although I’ve examined the process with out incident a number of occasions in newly-installed check mattress Home windows 10 environments, neither I nor unite.ai are responsible for any injury to programs from following those directions. I counsel you to again up any vital information sooner than making an attempt this type of set up process.
Issues
Is This Way Nonetheless Legitimate?
The generative AI scene is transferring very rapid, and we will be able to be expecting higher and extra streamlined strategies of Hunyuan Video LoRA frameworks this yr.
…and even this week! Whilst I used to be writing this text, the developer of Kohya/Musubi produced musubi-tuner-gui, an advanced Gradio GUI for Musubi Tuner:
Clearly a user-friendly GUI is preferable to the BAT recordsdata that I take advantage of on this function – as soon as musubi-tuner-gui is operating. As I write, it simplest went on-line 5 days in the past, and I will be able to to find no account of somebody effectively the usage of it.
Consistent with posts within the repository, the brand new GUI is meant to be rolled without delay into the Musubi Tuner undertaking once imaginable, which can finish its present lifestyles as a standalone GitHub repository.
In accordance with the prevailing set up directions, the brand new GUI will get cloned without delay into the present Musubi digital surroundings; and, regardless of many efforts, I will not get it to go together with the present Musubi set up. Which means when it runs, it’s going to to find that it has no engine!
As soon as the GUI is built-in into Musubi Tuner, problems with this type will certainly be resolved. Although the creator concedes that the brand new undertaking is ‘in point of fact tough’, he’s positive for its building and integration without delay into Musubi Tuner.
Given those problems (additionally regarding default paths at install-time, and using the UV Python package deal, which complicates sure procedures within the new liberate), we can more than likely have to attend somewhat for a smoother Hunyuan Video LoRA coaching revel in. That stated, it appears very promising!
But when you’ll be able to’t wait, and are keen to roll your sleeves up somewhat, you’ll be able to get Hunyuan video LoRA coaching working in the neighborhood at the moment.
Let’s get began.
Why Set up The rest on Naked Steel?
(Skip this paragraph in case you are no longer a sophisticated person)
Complicated customers will marvel why I’ve selected to put in such a lot of the tool at the naked steel Home windows 10 set up as an alternative of in a digital surroundings. The reason being that the very important Home windows port of the Linux-based Triton package deal is way more tricky to get running in a digital surroundings. The entire different bare-metal installations within the educational may no longer be put in in a digital surroundings, as they should interface without delay with native {hardware}.
Putting in Prerequisite Programs and Techniques
For the methods and applications that should be to start with put in, the order of set up issues. Let’s get began.
1: Obtain Microsoft Redistributable
Obtain and set up the Microsoft Redistributable package deal from https://aka.ms/vs/17/liberate/vc_redist.x64.exe.
This can be a simple and fast set up.
2: Set up Visible Studio 2022
Obtain the Microsoft Visible Studio 2022 Neighborhood version from https://visualstudio.microsoft.com/downloads/?cid=learn-onpage-download-install-visual-studio-page-cta
Get started the downloaded installer:
We do not want each and every to be had package deal, which might be a heavy and long set up. On the preliminary Workloads web page that opens, tick Desktop Building with C++ (see symbol beneath).
Now click on the Person Parts tab on the top-left of the interface and use the quest field to seek out ‘Home windows SDK’.
By means of default, simplest the Home windows 11 SDK is ticked. If you’re on Home windows 10 (this set up process has no longer been examined by means of me on Home windows 11), tick the most recent Home windows 10 edition, indicated within the symbol above.
Seek for ‘C++ CMake’ and take a look at that C++ CMake gear for Home windows is checked.
This set up will take a minimum of 13 GB of house.
As soon as Visible Studio has put in, it’s going to try to run to your pc. Let it open totally. When the Visible Studio’s full-screen interface is in spite of everything visual, shut this system.
3: Set up Visible Studio 2019
Probably the most next applications for Musubi expect an older edition of Microsoft Visible Studio, whilst others want a newer one.
Due to this fact additionally obtain the unfastened Neighborhood version of Visible Studio 19 both from Microsoft (https://visualstudio.microsoft.com/vs/older-downloads/ – account required) or Techspot (https://www.techspot.com/downloads/7241-visual-studio-2019.html).
Set up it with the similar choices as for Visible Studio 2022 (see process above, with the exception of that Home windows SDK is already ticked within the Visible Studio 2019 installer).
You can see that the Visible Studio 2019 installer is already acutely aware of the more recent edition because it installs:
When set up is entire, and you’ve got opened and closed the put in Visible Studio 2019 software, open a Home windows command immediate (Kind CMD in Get started Seek) and kind in and input:
the place cl
The end result will have to be the identified places of the 2 put in Visible Studio editions.
Should you as an alternative get INFO: May just no longer to find recordsdata for the given trend(s)
, see the Test Trail segment of this text beneath, and use the ones directions so as to add the related Visible Studio paths to Home windows surroundings.
Save any adjustments made in step with the Test Paths segment beneath, after which take a look at the the place cl command once more.
4: Set up CUDA 11 + 12 Toolkits
The quite a lot of applications put in in Musubi want other variations of NVIDIA CUDA, which hurries up and optimizes coaching on NVIDIA graphics playing cards.
The rationale we put in the Visible Studio variations first is that the NVIDIA CUDA installers seek for and combine with any present Visible Studio installations.
Obtain an 11+ sequence CUDA set up package deal from:
https://developer.nvidia.com/cuda-11-8-0-download-archive?target_os=Home windows&target_arch=x86_64&target_version=11&target_type=exe_local (obtain ‘exe (native’) )
Obtain a 12+ sequence CUDA Toolkit set up package deal from:
https://developer.nvidia.com/cuda-downloads?target_os=Home windows&target_arch=x86_64
The set up procedure is similar for each installers. Forget about any warnings in regards to the lifestyles or non-existence of set up paths in Home windows Setting variables – we’re going to attend to this manually later.
Set up NVIDIA CUDA Toolkit V11+
Get started the installer for the 11+ sequence CUDA Toolkit.
At Set up Choices, make a choice Customized (Complicated) and continue.
Uncheck the NVIDIA GeForce Revel in possibility and click on Subsequent.
Go away Make a choice Set up Location at defaults (that is vital):
Click on Subsequent and let the set up conclude.
Forget about any caution or notes that the installer offers about Nsight Visible Studio integration, which isn’t wanted for our use case.
Set up NVIDIA CUDA Toolkit V12+
Repeat all the procedure for the separate 12+ NVIDIA Toolkit installer that you just downloaded:
The set up procedure for this edition is similar to the only indexed above (the 11+ edition), with the exception of for one caution about surroundings paths, which you’ll be able to forget about:
When the 12+ CUDA edition set up is finished, open a command immediate in Home windows and kind and input:
nvcc --version
This will have to ascertain details about the put in driving force edition:
To test that your card is identified, variety and input:
nvidia-smi
5: Set up GIT
GIT will probably be dealing with the set up of the Musubi repository to your native gadget. Obtain the GIT installer at:
https://git-scm.com/downloads/win (’64-bit Git for Home windows Setup’)
Run the installer:
Use default settings for Make a choice Parts:
Go away the default editor at Vim:
Let GIT make a decision about department names:
Use beneficial settings for the Trail Setting:
Use beneficial settings for SSH:
Use beneficial settings for HTTPS Delivery backend:
Use beneficial settings for line-ending conversions:
Make a choice Home windows default console because the Terminal Emulator:
Use default settings (Speedy-forward or merge) for Git Pull:
Use Git-Credential Supervisor (the default atmosphere) for Credential Helper:
In Configuring further choices, depart Permit dossier formula caching ticked, and Permit symbolic hyperlinks unticked (except you might be a sophisticated person who’s the usage of onerous hyperlinks for a centralized mannequin repository).
Conclude the set up and check that Git is put in correctly by means of opening a CMD window and typing and getting into:
git --version
GitHub Login
Later, while you try to clone GitHub repositories, you can be challenged on your GitHub credentials. To await this, log into your GitHub account (create one, if important) on any browsers put in to your Home windows formula. On this method, the 0Auth authentication way (a pop-up window) will have to take as little time as imaginable.
After that preliminary problem, you will have to keep authenticated mechanically.
6: Set up CMake
CMake 3.21 or more recent is needed for portions of the Musubi set up procedure. CMake is a cross-platform building structure able to orchestrating numerous compilers, and of compiling tool from supply code.
Obtain it at:
https://cmake.org/obtain/ (‘Home windows x64 Installer’)
Release the installer:
Be certain that Upload Cmake to the PATH surroundings variable is checked.
Press Subsequent.
Kind and input this command in a Home windows Command immediate:
cmake --version
If CMake put in effectively, it’s going to show one thing like:
cmake edition 3.31.4
CMake suite maintained and supported by means of Kitware (kitware.com/cmake).
7: Set up Python 3.10
The Python interpreter is central to this undertaking. Obtain the three.10 edition (the most efficient compromise between the other calls for of Musubi applications) at:
https://www.python.org/downloads/liberate/python-3100/ (‘Home windows installer (64-bit)’)
Run the obtain installer, and depart at default settings:
On the finish of the set up procedure, click on Disable trail period restrict (calls for UAC admin affirmation):
In a Home windows Command immediate variety and input:
python --version
This will have to lead to Python 3.10.0
Test Paths
The cloning and set up of the Musubi frameworks, in addition to its customary operation after set up, calls for that its elements know the trail to a number of vital exterior elements in Home windows, specifically CUDA.
So we wish to open the trail surroundings and take a look at that all of the requisites are in there.
A snappy option to get to the controls for Home windows Setting is to variety Edit the formula surroundings variables into the Home windows seek bar.
Clicking this may increasingly open the Machine Houses keep watch over panel. Within the decrease correct of Machine Houses, click on the Setting Variables button, and a window known as Setting Variables opens up. Within the Machine Variables panel within the backside part of this window, scroll right down to Trail and double-click it. This opens a window known as Edit surroundings variables. Drag the width of this window wider so you’ll be able to see the entire trail of the variables:
Right here the vital entries are:
C:Program FilesNVIDIA GPU Computing ToolkitCUDAv12.6bin
C:Program FilesNVIDIA GPU Computing ToolkitCUDAv12.6libnvvp
C:Program FilesNVIDIA GPU Computing ToolkitCUDAv11.8bin
C:Program FilesNVIDIA GPU Computing ToolkitCUDAv11.8libnvvp
C:Program Recordsdata (x86)Microsoft Visible Studio2019CommunityVCToolsMSVC14.29.30133binHostx64x64
C:Program FilesMicrosoft Visible Studio2022CommunityVCToolsMSVC14.42.34433binHostx64x64
C:Program FilesGitcmd
C:Program FilesCMakebin
Most often, the right kind trail variables will have to already be provide.
Upload any paths which can be lacking by means of clicking New at the left of the Edit surroundings variable window and pasting in the right kind trail:
Do NOT simply reproduction and paste from the trails indexed above; take a look at that every similar trail exists for your personal Home windows set up.
If there are minor trail diversifications (specifically with Visible Studio installations), use the trails indexed above to seek out the right kind goal folders (i.e., x64 in Host64 for your personal set up. Then paste the ones paths into the Edit surroundings variable window.
After this, restart the pc.
Putting in Musubi
Improve PIP
The usage of the most recent edition of the PIP installer can clean one of the set up phases. In a Home windows Command immediate with administrator privileges (see Elevation, beneath), variety and input:
pip set up --upgrade pip
Elevation
Some instructions might require increased privileges (i.e., to be run as an administrator). Should you obtain error messages about permissions within the following phases, shut the command immediate window and reopen it in administrator mode by means of typing CMD into Home windows seek field, right-clicking on Command Advised and settling on Run as administrator:
For the following phases, we’re going to use Home windows Powershell as an alternative of the Home windows Command immediate. You’ll be able to to find this by means of getting into Powershell into the Home windows seek field, and (as important) right-clicking on it to Run as administrator:
Set up Torch
In Powershell, variety and input:
pip set up torch torchvision torchaudio --index-url https://obtain.pytorch.org/whl/cu118
Be affected person whilst the various applications set up.
When finished, you’ll be able to check a GPU-enabled PyTorch set up by means of typing and getting into:
python -c "import torch; print(torch.cuda.is_available())"
This will have to lead to:
C:WINDOWSsystem32>python -c "import torch;
print(torch.cuda.is_available())"
True
Set up Triton for Home windows
Subsequent, the set up of the Triton for Home windows element. In increased Powershell, input (on a unmarried line):
pip set up https://github.com/woct0rdho/triton-windows/releases/obtain/v3.1.0-windows.post8/triton-3.1.0-cp310-cp310-win_amd64.whl
(The installer triton-3.1.0-cp310-cp310-win_amd64.whl
works for each Intel and AMD CPUs so long as the structure is 64-bit and the surroundings suits the Python edition)
After working, this will have to lead to:
Effectively put in triton-3.1.0
We will be able to take a look at if Triton is operating by means of uploading it in Python. Input this command:
python -c "import triton; print('Triton is operating')"
This will have to output:
Triton is operating
To test that Triton is GPU-enabled, input:
python -c "import torch; print(torch.cuda.is_available())"
This will have to lead to True
:
Create the Digital Setting for Musubi
Any more, we can set up to any extent further tool right into a Python digital surroundings (or venv). Which means all it is important to do to uninstall all of the following tool is to pull the venv’s set up folder to the trash.
Let’s create that set up folder: make a folder known as Musubi to your desktop. The next examples suppose that this folder exists: C:Customers[Your Profile Name]DesktopMusubi
.
In Powershell, navigate to that folder by means of getting into:
cd C:Customers[Your Profile Name]DesktopMusubi
We would like the digital surroundings to have get admission to to what we have now put in already (particularly Triton), so we can use the --system-site-packages
flag. Input this:
python -m venv --system-site-packages musubi
Look ahead to the surroundings to be created, after which turn on it by means of getting into:
.musubiScriptsactivate
From this level on, you’ll be able to inform that you’re within the activated digital surroundings by means of the truth that (musubi) seems initially of your whole activates.
Clone the Repository
Navigate to the newly-created musubi folder (which is within the Musubi folder to your desktop):
cd musubi
Now that we’re in the fitting position, input the next command:
git clone https://github.com/kohya-ss/musubi-tuner.git
Look ahead to the cloning to finish (it’s going to no longer take lengthy).
Putting in Necessities
Navigate to the set up folder:
cd musubi-tuner
Input:
pip set up -r necessities.txt
Look ahead to the various installations to complete (this may increasingly take longer).
Automating Get admission to to the Hunyuan Video Venv
To simply turn on and get admission to the brand new venv for long run classes, paste the next into Notepad and reserve it with the title turn on.bat, saving it with All recordsdata possibility (see symbol beneath).
@echo off
name C:Customers[Your Profile Name]DesktopMusubimusubiScriptsactivate
cd C:Customers[Your Profile Name]DesktopMusubimusubimusubi-tuner
cmd
(Substitute [Your Profile Name]
with the actual title of your Home windows person profile)
It does no longer subject into which location you save this dossier.
Any more you’ll be able to double-click turn on.bat and get started paintings right away.
The usage of Musubi Tuner
Downloading the Fashions
The Hunyuan Video LoRA coaching procedure calls for the downloading of a minimum of seven fashions so as to make stronger all of the imaginable optimization choices for pre-caching and coaching a Hunyuan video LoRA. In combination, those fashions weigh greater than 60GB.
Present directions for downloading them will also be discovered at https://github.com/kohya-ss/musubi-tuner?tab=readme-ov-file#model-download
Alternatively, those are the obtain directions on the time of writing:
clip_l.safetensors
and
llava_llama3_fp16.safetensorsllava_llama3_fp8_scaled.safetensors
will also be downloaded at:
https://huggingface.co/Comfortable-Org/HunyuanVideo_repackaged/tree/primary/split_files/text_encoders
mp_rank_00_model_states.pt
and
mp_rank_00_model_states_fp8.ptmp_rank_00_model_states_fp8_map.pt
will also be downloaded at:
https://huggingface.co/tencent/HunyuanVideo/tree/primary/hunyuan-video-t2v-720p/transformers
pytorch_model.pt
will also be downloaded at:
https://huggingface.co/tencent/HunyuanVideo/tree/primary/hunyuan-video-t2v-720p/vae
Although you’ll be able to position those in any listing you select, for consistency with later scripting, let’s put them in:
C:Customers[Your Profile Name]DesktopMusubimusubimusubi-tunerfashions
That is in step with the listing association prior so far. Any instructions or directions hereafter will suppose that that is the place the fashions are positioned; and do not put out of your mind to switch [Your Profile Name] along with your actual Home windows profile folder title.
Dataset Preparation
Ignoring neighborhood controversy at the level, it is honest to mention that you’re going to want someplace between 10-100 footage for a coaching dataset on your Hunyuan LoRA. Superb effects will also be got even with 15 photographs, as long as the photographs are well-balanced and of excellent high quality.
A Hunyuan LoRA will also be educated each on photographs or very brief and low-res video clips, and even a mix of every – despite the fact that the usage of video clips as coaching information is difficult, even for a 24GB card.
Alternatively, video clips are simplest in point of fact helpful in case your personality strikes in such an strange method that the Hunyuan Video basis mannequin would possibly no longer find out about it, or have the ability to bet.
Examples would come with Roger Rabbit, a xenomorph, The Masks, Spider-Guy, or different personalities that possess distinctive feature motion.
Since Hunyuan Video already is aware of how odd women and men transfer, video clips don’t seem to be important to acquire a powerful Hunyuan Video LoRA human-type personality. So we’re going to use static photographs.
Symbol Preparation
The Bucket Checklist
The TLDR edition:
It is best to both use photographs which can be all of the identical dimension on your dataset, or use a 50/50 cut up between two other sizes, i.e., 10 photographs which can be 512x768px and 10 which can be 768x512px.
The educational would possibly move effectively even supposing you do not do that – Hunyuan Video LoRAs will also be unusually forgiving.
The Longer Model
As with Kohya-ss LoRAs for static generative programs corresponding to Solid Diffusion, bucketing is used to distribute the workload throughout differently-sized photographs, permitting greater photographs for use with out inflicting out-of-memory mistakes at coaching time (i.e., bucketing ‘cuts up’ the photographs into chunks that the GPU can take care of, whilst keeping up the semantic integrity of the entire symbol).
For every dimension of symbol you come with for your coaching dataset (i.e., 512x768px), a bucket, or ‘sub-task’ will probably be created for that dimension. So in case you have the next distribution of pictures, that is how the bucket consideration turns into unbalanced, and dangers that some footage will probably be given better attention in coaching than others:
2x 512x768px photographs
7x 768x512px photographs
1x 1000x600px symbol
3x 400x800px photographs
We will be able to see that bucket consideration is split unequally amongst those photographs:
Due to this fact both keep on with one layout dimension, or try to stay the distribution of various sizes moderately equivalent.
In both case, keep away from very huge photographs, as that is prone to decelerate coaching, to negligible receive advantages.
For simplicity, I’ve used 512x768px for all of the footage in my dataset.
Disclaimer: The mannequin (user) used within the dataset gave me complete permission to make use of those photos for this function, and exercised approval of all AI-based output depicting her likeness featured on this article.
My dataset is composed of 40 photographs, in PNG layout (although JPG is ok too). My photographs have been saved at C:UsersMartinDesktopDATASETS_HUNYUANexamplewoman
You will have to create a cache folder within the coaching symbol folder:
Now let’s create a different dossier that may configure the learning.
TOML Recordsdata
The educational and pre-caching processes of Hunyuan Video LoRAs obtains the dossier paths from a flat textual content dossier with the .toml extension.
For my check, the TOML is situated at C:UsersMartinDesktopDATASETS_HUNYUANtraining.toml
The contents of my coaching TOML appear to be this:
[general]
solution = [512, 768]
caption_extension = ".txt"
batch_size = 1
enable_bucket = true
bucket_no_upscale = false
[[datasets]]
image_directory = "C:CustomersMartinDesktopDATASETS_HUNYUANexamplewoman"
cache_directory = "C:CustomersMartinDesktopDATASETS_HUNYUANexamplewomancache"
num_repeats = 1
(The double back-slashes for symbol and cache directories don’t seem to be at all times important, however they may be able to lend a hand to keep away from mistakes in circumstances the place there’s a house within the trail. I’ve educated fashions with .toml recordsdata that used single-forward and single-backward slashes)
We will be able to see within the solution
segment that two resolutions will probably be thought to be – 512px and 768px. You’ll be able to additionally depart this at 512, and nonetheless download excellent effects.
Captions
Hunyuan Video is a textual content+imaginative and prescient basis mannequin, so we’d like descriptive captions for those photographs, which will probably be thought to be all the way through coaching. The educational procedure will fail with out captions.
There are a mess of open supply captioning programs shall we use for this project, however let’s stay it easy and use the taggui formula. Although it’s saved at GitHub, and although it does obtain some very heavy deep finding out fashions on first run, it comes within the type of a easy Home windows executable that quite a bit Python libraries and an easy GUI.
After beginning Taggui, use Record > Load Listing to navigate in your symbol dataset, and optionally put a token identifier (on this case, examplewoman) that will probably be added to all of the captions:
(You should definitely flip off Load in 4-bit when Taggui first opens – it’s going to throw mistakes all the way through captioning if that is left on)
Make a choice a picture within the left-hand preview column and press CTRL+A to choose all of the photographs. Then press the Get started Auto-Captioning button at the correct:
You’ll see Taggui downloading fashions within the small CLI within the right-hand column, however provided that that is the primary time you could have run the captioner. In a different way you are going to see a preview of the captions.
Now, every picture has a corresponding .txt caption with an outline of its symbol contents:
You’ll be able to click on Complicated Choices in Taggui to extend the period and elegance of captions, however this is past the scope of this run-through.
Surrender Taggui and let’s transfer directly to…
Latent Pre-Caching
To keep away from over the top GPU load at coaching time, it is important to create two forms of pre-cached recordsdata – one to constitute the latent symbol derived from the photographs themselves, and some other to judge a textual content encoding in terms of caption content material.
To simplify all 3 processes (2x cache + coaching), you’ll be able to use interactive .BAT recordsdata that may ask you questions and adopt the processes if in case you have given the important data.
For the latent pre-caching, reproduction the next textual content into Notepad and reserve it as a .BAT dossier (i.e., title it one thing like latent-precache.bat), as previous, making sure that the dossier variety within the drop down menu within the Save As discussion is All Recordsdata (see symbol beneath):
@echo off
REM Turn on the digital surroundings
name C:Customers[Your Profile Name]DesktopMusubimusubiScriptsactivate.bat
REM Get person enter
set /p IMAGE_PATH=Input the trail to the picture listing:
set /p CACHE_PATH=Input the trail to the cache listing:
set /p TOML_PATH=Input the trail to the TOML dossier:
echo You entered:
echo Symbol trail: %IMAGE_PATH%
echo Cache trail: %CACHE_PATH%
echo TOML dossier trail: %TOML_PATH%
set /p CONFIRM=Do you wish to have to continue with latent pre-caching (y/n)?
if /i "%CONFIRM%"=="y" (
REM Run the latent pre-caching script
python C:Customers[Your Profile Name]DesktopMusubimusubimusubi-tunercache_latents.py --dataset_config %TOML_PATH% --vae C:Customers[Your Profile Name]DesktopMusubimusubimusubi-tunermodelspytorch_model.pt --vae_chunk_size 32 --vae_tiling
) else (
echo Operation canceled.
)
REM Stay the window open
pause
(Just be sure you exchange [Your Profile Name] along with your actual Home windows profile folder title)
Now you’ll be able to run the .BAT dossier for automated latent caching:
When brought on to by means of the quite a lot of questions from the BAT dossier, paste or variety within the trail in your dataset, cache folders and TOML dossier.
Textual content Pre-Caching
We’re going to create a 2nd BAT dossier, this time for the textual content pre-caching.
@echo off
REM Turn on the digital surroundings
name C:Customers[Your Profile Name]DesktopMusubimusubiScriptsactivate.bat
REM Get person enter
set /p IMAGE_PATH=Input the trail to the picture listing:
set /p CACHE_PATH=Input the trail to the cache listing:
set /p TOML_PATH=Input the trail to the TOML dossier:
echo You entered:
echo Symbol trail: %IMAGE_PATH%
echo Cache trail: %CACHE_PATH%
echo TOML dossier trail: %TOML_PATH%
set /p CONFIRM=Do you wish to have to continue with textual content encoder output pre-caching (y/n)?
if /i "%CONFIRM%"=="y" (
REM Use the python executable from the digital surroundings
python C:Customers[Your Profile Name]DesktopMusubimusubimusubi-tunercache_text_encoder_outputs.py --dataset_config %TOML_PATH% --text_encoder1 C:Customers[Your Profile Name]DesktopMusubimusubimusubi-tunermodelsllava_llama3_fp16.safetensors --text_encoder2 C:Customers[Your Profile Name]DesktopMusubimusubimusubi-tunermodelsclip_l.safetensors --batch_size 16
) else (
echo Operation canceled.
)
REM Stay the window open
pause
Substitute your Home windows profile title and save this as text-cache.bat (or every other title you prefer), in any handy location, as in line with the process for the former BAT dossier.
Run this new BAT dossier, observe the directions, and the important text-encoded recordsdata will seem within the cache folder:
Coaching the Hunyuan Video Lora
Coaching the true LoRA will take significantly longer than those two preparatory processes.
Although there also are more than one variables that shall we concern about (corresponding to batch dimension, repeats, epochs, and whether or not to make use of complete or quantized fashions, amongst others), we’re going to save those concerns for some other day, and a deeper have a look at the intricacies of LoRA advent.
For now, let’s reduce the decisions somewhat and educate a LoRA on ‘median’ settings.
We’re going to create a 3rd BAT dossier, this time to begin coaching. Paste this into Notepad and reserve it as a BAT dossier, like sooner than, as coaching.bat (or any title you please):
@echo off
REM Turn on the digital surroundings
name C:Customers[Your Profile Name]DesktopMusubimusubiScriptsactivate.bat
REM Get person enter
set /p DATASET_CONFIG=Input the trail to the dataset configuration dossier:
set /p EPOCHS=Input the collection of epochs to coach:
set /p OUTPUT_NAME=Input the output mannequin title (e.g., example0001):
set /p LEARNING_RATE=Make a choice finding out charge (1 for 1e-3, 2 for 5e-3, default 1e-3):
if "%LEARNING_RATE%"=="1" set LR=1e-3
if "%LEARNING_RATE%"=="2" set LR=5e-3
if "%LEARNING_RATE%"=="" set LR=1e-3
set /p SAVE_STEPS=How frequently (in steps) to avoid wasting preview photographs:
set /p SAMPLE_PROMPTS=What's the location of the text-prompt dossier for coaching previews?
echo You entered:
echo Dataset configuration dossier: %DATASET_CONFIG%
echo Selection of epochs: %EPOCHS%
echo Output title: %OUTPUT_NAME%
echo Finding out charge: %LR%
echo Save preview photographs each and every %SAVE_STEPS% steps.
echo Textual content-prompt dossier: %SAMPLE_PROMPTS%
REM Get ready the command
set CMD=boost up release --num_cpu_threads_per_process 1 --mixed_precision bf16 ^
C:Customers[Your Profile Name]DesktopMusubimusubimusubi-tunerhv_train_network.py ^
--dit C:Customers[Your Profile Name]DesktopMusubimusubimusubi-tunermodelsmp_rank_00_model_states.pt ^
--dataset_config %DATASET_CONFIG% ^
--sdpa ^
--mixed_precision bf16 ^
--fp8_base ^
--optimizer_type adamw8bit ^
--learning_rate %LR% ^
--gradient_checkpointing ^
--max_data_loader_n_workers 2 ^
--persistent_data_loader_workers ^
--network_module=networks.lora ^
--network_dim=32 ^
--timestep_sampling sigmoid ^
--discrete_flow_shift 1.0 ^
--max_train_epochs %EPOCHS% ^
--save_every_n_epochs=1 ^
--seed 42 ^
--output_dir "C:Customers[Your Profile Name]DesktopMusubiOutput Fashions" ^
--output_name %OUTPUT_NAME% ^
--vae C:/Customers/[Your Profile Name]/Desktop/Musubi/musubi/musubi-tuner/fashions/pytorch_model.pt ^
--vae_chunk_size 32 ^
--vae_spatial_tile_sample_min_size 128 ^
--text_encoder1 C:/Customers/[Your Profile Name]/Desktop/Musubi/musubi/musubi-tuner/fashions/llava_llama3_fp16.safetensors ^
--text_encoder2 C:/Customers/[Your Profile Name]/Desktop/Musubi/musubi/musubi-tuner/fashions/clip_l.safetensors ^
--sample_prompts %SAMPLE_PROMPTS% ^
--sample_every_n_steps %SAVE_STEPS% ^
--sample_at_first
echo The next command will probably be carried out:
echo %CMD%
set /p CONFIRM=Do you wish to have to continue with coaching (y/n)?
if /i "%CONFIRM%"=="y" (
%CMD%
) else (
echo Operation canceled.
)
REM Stay the window open
cmd /ok
As standard, be sure you exchange all circumstances of [Your Profile Name] along with your right kind Home windows profile title.
Be sure that the listing C:Customers[Your Profile Name]DesktopMusubiOutput Fashions
exists, and create it at that location if no longer.
Coaching Previews
There’s a very elementary coaching preview function not too long ago enabled for Musubi instructor, which lets you drive the learning mannequin to pause and generate photographs according to activates you could have stored. Those are stored in an mechanically created folder known as Pattern, in the similar listing that the educated fashions are stored.
To allow this, it is important to save finally one immediate in a textual content dossier. The educational BAT we created will ask you to enter the positioning of this dossier; due to this fact you’ll be able to title the immediate dossier to be the rest you prefer, and reserve it any place.
Listed here are some immediate examples for a dossier that may output 3 other photographs when asked by means of the learning regimen:
As you’ll be able to see within the instance above, you’ll be able to put flags on the finish of the immediate that may have an effect on the photographs:
–w is width (defaults to 256px if no longer set, in step with the doctors)
–h is peak (defaults to 256px if no longer set)
–f is the collection of frames. If set to one, a picture is produced; multiple, a video.
–d is the seed. If no longer set, it’s random; however you will have to set it to peer one immediate evolving.
–s is the collection of steps in era, defaulting to twenty.
See the legitimate documentation for extra flags.
Although coaching previews can briefly divulge some problems that would possibly purpose you to cancel the learning and rethink the knowledge or the setup, thus saving time, do understand that each and every further immediate slows down the learning somewhat extra.
Additionally, the larger the learning preview symbol’s width and peak (as set within the flags indexed above), the extra it’s going to gradual coaching down.
Release your coaching BAT dossier.
Query #1 is ‘Input the trail to the dataset configuration. Paste or variety in the right kind trail in your TOML dossier.
Query #2 is ‘Input the collection of epochs to coach’. This can be a trial-and-error variable, since it is suffering from the volume and high quality of pictures, in addition to the captions, and different components. Generally, it is best to set it too prime than too low, since you’ll be able to at all times forestall the learning with Ctrl+C within the coaching window if you’re feeling the mannequin has complicated sufficient. Set it to 100 within the first example, and notice the way it is going.
Query #3 is ‘Input the output mannequin title’. Title your mannequin! Is also perfect to stay the title moderately brief and easy.
Query #4 is ‘Make a choice finding out charge’, which defaults to 1e-3 (possibility 1). This can be a excellent position to begin, pending additional revel in.
Query #5 is ‘How frequently (in steps) to avoid wasting preview photographs. Should you set this too low, you are going to see little development between preview symbol saves, and this may increasingly decelerate the learning.
Query #6 is ‘What’s the location of the text-prompt dossier for coaching previews?’. Paste or variety within the trail in your activates textual content dossier.
The BAT then presentations you the command it’s going to ship to the Hunyuan Fashion, and asks you if you wish to continue, y/n.
Cross forward and start coaching:
All the way through this time, in the event you take a look at the GPU segment of the Efficiency tab of Home windows Activity Supervisor, you can see the method is taking round 16GB of VRAM.
This will not be an arbitrary determine, as that is the volume of VRAM to be had on moderately a couple of NVIDIA graphics playing cards, and the upstream code can have been optimized to suit the duties into 16GB for the good thing about those that personal such playing cards.
That stated, it is vitally simple to lift this utilization, by means of sending extra exorbitant flags to the learning command.
All the way through coaching, you can see within the lower-right aspect of the CMD window a determine for a way a lot time has handed since coaching started, and an estimate of general coaching time (which can range closely relying on flags set, collection of coaching photographs, collection of coaching preview photographs, and a number of other different components).
A standard coaching time is round 3-4 hours on median settings, relying at the to be had {hardware}, collection of photographs, flag settings, and different components.
The usage of Your Educated LoRA Fashions in Hunyuan Video
Opting for Checkpoints
When coaching is concluded, you are going to have a mannequin checkpoint for every epoch of coaching.
This saving frequency will also be modified by means of the person to avoid wasting roughly continuously, as desired, by means of amending the --save_every_n_epochs [N]
quantity within the coaching BAT dossier. Should you added a low determine for saves-per-steps when putting in coaching with the BAT, there will probably be a prime collection of stored checkpoint recordsdata.
Which Checkpoint to Make a choice?
As discussed previous, the earliest-trained fashions will probably be maximum versatile, whilst the later checkpoints might be offering essentially the most element. The one option to check for those components is to run one of the LoRAs and generate a couple of movies. On this method you’ll be able to get to understand which checkpoints are best, and constitute the most efficient stability between flexibility and constancy.
ComfyUI
The most well liked (although no longer the one) surroundings for the usage of Hunyuan Video LoRAs, nowadays, is ComfyUI, a node-based editor with an elaborate Gradio interface that runs for your internet browser.
Supply: https://github.com/comfyanonymous/ComfyUI
Set up directions are simple and to be had on the legitimate GitHub repository (further fashions must be downloaded).
Changing Fashions for ComfyUI
Your educated fashions are stored in a (diffusers) layout that isn’t well matched with maximum implementations of ComfyUI. Musubi is in a position to convert a mannequin to a ComfyUI-compatible layout. Let’s arrange a BAT dossier to put into effect this.
Prior to working this BAT, create the C:Customers[Your Profile Name]DesktopMusubiCONVERTED
folder that the script is anticipating.
@echo off
REM Turn on the digital surroundings
name C:Customers[Your Profile Name]DesktopMusubimusubiScriptsactivate.bat
:START
REM Get person enter
set /p INPUT_PATH=Input the trail to the enter Musubi safetensors dossier (or variety "go out" to surrender):
REM Go out if the person sorts "go out"
if /i "%INPUT_PATH%"=="go out" goto END
REM Extract the dossier title from the enter trail and append 'transformed' to it
for %%F in ("%INPUT_PATH%") do set FILENAME=%%~nF
set OUTPUT_PATH=C:Customers[Your Profile Name]DesktopMusubiOutput ModelsCONVERTEDpercentFILENAMEpercent_converted.safetensors
set TARGET=different
echo You entered:
echo Enter dossier: %INPUT_PATH%
echo Output dossier: %OUTPUT_PATH%
echo Goal layout: %TARGET%
set /p CONFIRM=Do you wish to have to continue with the conversion (y/n)?
if /i "%CONFIRM%"=="y" (
REM Run the conversion script with accurately quoted paths
python C:Customers[Your Profile Name]DesktopMusubimusubimusubi-tunerconvert_lora.py --input "%INPUT_PATH%" --output "%OUTPUT_PATH%" --target %TARGET%
echo Conversion entire.
) else (
echo Operation canceled.
)
REM Go back to begin for some other dossier
goto START
:END
REM Stay the window open
echo Exiting the script.
pause
As with the former BAT recordsdata, save the script as ‘All recordsdata’ from Notepad, naming it convert.bat (or no matter you prefer).
As soon as stored, double-click the brand new BAT dossier, which can ask for the positioning of a dossier to transform.
Paste in or variety the trail to the educated dossier you wish to have to transform, click on y
, and press input.
After saving the transformed LoRA to the CONVERTED folder, the script will ask if you need to transform some other dossier. If you wish to check more than one checkpoints in ComfyUI, convert a collection of the fashions.
When you’ve got transformed sufficient checkpoints, shut the BAT command window.
You’ll be able to now reproduction your transformed fashions into the modelsloras folder for your ComfyUI set up.
Usually the right kind location is one thing like:
C:Customers[Your Profile Name]DesktopComfyUImodelsloras
Growing Hunyuan Video LoRAs in ComfyUI
Although the node-based workflows of ComfyUI appear advanced to start with, the settings of alternative extra knowledgeable customers will also be loaded by means of dragging a picture (made with the opposite person’s ComfyUI) without delay into the ComfyUI window. Workflows may also be exported as JSON recordsdata, which will also be imported manually, or dragged right into a ComfyUI window.
Some imported workflows can have dependencies that would possibly not exist for your set up. Due to this fact set up ComfyUI-Supervisor, which is able to fetch lacking modules mechanically.
Supply: https://github.com/ltdrdata/ComfyUI-Supervisor
To load some of the workflows used to generate movies from the fashions on this educational, obtain this JSON dossier and drag it into your ComfyUI window (although there are some distance higher workflow examples to be had on the quite a lot of Reddit and Discord communities that experience followed Hunyuan Video, and my very own is customized from this kind of).
This isn’t where for a longer educational in using ComfyUI, however it’s price bringing up some of the a very powerful parameters that may have an effect on your output in the event you obtain and use the JSON format that I related to above.
1) Width and Top
The bigger your symbol, the longer the era will take, and the upper the danger of an out-of-memory (OOM) error.
2) Period
That is the numerical price for the collection of frames. What number of seconds it provides as much as rely at the body charge (set to 30fps on this format). You’ll be able to convert seconds>frames according to fps at Omnicalculator.
3) Batch dimension
The upper you put the batch dimension, the faster the end result might come, however the better the load of VRAM. Set this too prime and you can get an OOM.
4) Regulate After Generate
This controls the random seed. The choices for this sub-node are mounted, increment, decrement and randomize. Should you depart it at mounted and don’t trade the textual content immediate, you are going to get the similar symbol each and every time. Should you amend the textual content immediate, the picture will trade to a restricted extent. The increment and decrement settings mean you can discover close by seed values, whilst randomize will give you a wholly new interpretation of the immediate.
5) Lora Title
It is important to choose your individual put in mannequin right here, sooner than making an attempt to generate.
6) Token
When you have educated your mannequin to cause the idea that with a token, (corresponding to ‘example-person’), put that cause phrase for your immediate.
7) Steps
This represents what number of steps the formula will practice to the diffusion procedure. Upper steps might download higher element, however there’s a ceiling on how efficient this way is, and that threshold will also be onerous to seek out. The average vary of steps is round 20-30.
8) Tile Dimension
This defines how a lot data is treated at one time all the way through era. It is set to 256 by means of default. Elevating it might accelerate era, however elevating it too prime can result in a specifically irritating OOM revel in, because it comes on the very finish of a protracted procedure.
9) Temporal Overlap
Hunyuan Video era of folks can result in ‘ghosting’, or unconvincing motion if that is set too low. Generally, the present knowledge is this will have to be set to the next price than the collection of frames, to provide higher motion.
Conclusion
Although additional exploration of ComfyUI utilization is past the scope of this text, neighborhood revel in at Reddit and Discords can ease the training curve, and there are a number of on-line guides that introduce the fundamentals.
First printed Thursday, January 23, 2025