Dreambooth Face Test 3 - Source Checkpoints: 1.4 vs 1.4-full-ema vs 1.5 vs 1.5-emaonly
Comparing 4 official models to see which one is the best base for face fine-tunes
followfox.ai is an AI exploratory initiative of the boutique marketing agency FollowFox.org.
Until AI takes over, FollowFox.org offers a full range of marketing services at boutique quality by top talent in the region. Support us by:
Checking out our website link
Liking our LinkedIn page link
Checking our sortlist profile link
or by subscribing to this blog
Spoiler: Summary of findings
We think all base models did a fairly good job. If you are looking for a single recommendation: use SD 1.5-pruned at 3,000 steps. If you want to experiment a bit with steps or other settings, SD 1.5-pruned-emaonly seems like the best option. And in general, 1.5 weights did better than the 1.4 ones.
Overview and Setup
One of the first decisions you must make before doing a Dreambooth fine-tune is which model to use as a base. We decided to test a few official ones and compared the results to each other. The main goal was to find the model with the best output quality, but some performance optimization possibilities were also monitored (file size, training time, etc.)
Overall, the setup was identical to the previous experiment. See the details link. The main difference is that we tried four different source checkpoints at three different learning steps each (2k, 3k, 4k). Checkpoints tested:
As we are trying to standardize judging criteria, we used the same three concepts for each model. See details here the link:
A realistic, high-quality photo
Fine-tuned subject depicted as Superman
Summary of Results
Overall, we got a lot of good fine-tuned checkpoints. The realistic photo was an easy task for all the models. With avatars, all did fine except for SD 1.4 Full ema. And finally, for Superman… we are not getting consistently good results, but if any, SD 1.5-pruned-emaonly did the best.
Loss Graphs and Training Time
Interestingly, both 1.5 checkpoints took a bit less time (46 mins to 4k steps) than 1.4 ones (51mins to 4k steps)
As for loss graphs, surprisingly, we got 4 identical graphs about loss and steps values.