Benchmarking and Mixing SD Models: Vodka Based Cocktails
We are releasing our first mix - Bloody Mary based on Vodka V2 along with the detailed methodology how we made it
Hello, FollowFox community!
A few days ago, we launched early Alpha of Distillery, our own Generative AI art service (invite link HERE).
It runs on Bloody Mary mix, meaning you can try for free to generate the images from the model from this post by simply sending prompts on Discord. Give it a try, and let us know what you think!
Please note that the post is from a few weeks away as we wanted to time it together with the Distillery launch (link).
We recently released our V2 of the Vodka trained model (link to the post), and on Civitai, it has been gaining some traction (link).
However, it is well known that some of the top custom SD models are created by mixing some high-quality models. So we had to look into that and see what could be achieved.
As usual, we are using a methodical and experimental approach here. We benchmarked Vodka V2 vs. some of the top base models, compared images across different types of generations, and selected a mix to compensate for areas where Vodka was underperforming.
We called the resulting mix Bloody Mary and released it on Civitai. (link)
So let’s dive in.
Benchmarking and Analyzing Vodka V2
We have just released an automated script that allows us to generate a bunch of XYZ comparisons relatively efficiently (link).
Using that script, we compared Vodka V2 against five awesome base models also trained. We are open to suggestions on which ones should be added here for future tests, but today’s lineup looks as follows:
NeverEnding Dream (link)
Counterfeit-V3.0 (link)
DreamShaper - V6 (link)
epiCRealism (link)
RPG V4 (link)
You can find all full-resolution XYZ grids on HuggingFace (link).
Let’s review each image and see what conclusions we can make.
Circles
It’s interesting how much we can say just by looking at the circles:
All models generate high-quality circles indicating more or less high-quality, preserved models.
Counterfeit is more ‘crazy,’ which makes sense, given that it is the most stylized model.
NeverEndingDream and DreamShaper have many similarities, which is not surprising given that they have the same author, and there is probably a good amount of overlap in training methodology and even datasets.
RPG has similarities with Vodka, likely meaning that it was also trained on MidJourney data.
Jennifer Lawrence
This is a great test to check whether the model is ‘forgetting’ certain subjects, plus a good test on photorealism.
The ‘burnt’ feel of Vodka quickly stands out. We have been getting this added darkness and shiny generations when training on MidJourney data.
Arguably, Vodka has the best representation of the subject. (Along with RPG).
NeverEndingDream and DreamShaper seem to be adding a very consistent but specific style, as well as have some indications of ‘forgetting’
Counterfeit is doing its thing.
epiCRealism photos look very realistic, but we can see a lot of Asian looks bias, coupled with forgetting.
RPG again shows similarities with Vodka but has a much less burnt look.
Standing Woman
A test to see how well the model follows the essence of the prompt to show the full body; can highlight deformities and issues with photorealism.
We won’t keep repeating the observations from previous tests and highlight the ones where something new was observed.
Counterfeit, NeverEndingDream, and DreamShaper did the best job showing a standing woman's full body. But all models did an ok job overall.
RPG seems to have struggled the most with crops, deformed faces, etc.
NSWF Generation
We won’t display the image or go into details here, but Vodka showed itself as a Safe for Work model compared to others - likely to MidJourney’s dataset.
Stylized Girl
We liked almost all the generations (maybe except for RPG), and no not many new conclusions to be made here:
We like Vodka feel, probably if we had to choose one - it is Vodka.
epiCRealism looks very interesting when generating stylized images and could be a great addition to the mix.
Counterfeit looks burned here.
Cute Strawberry
Inspired by (link).
Most models struggled to follow this prompt.
Vodka, DreamShaper, and epiCRealism had some interesting and very cute results.
Given the lack of consistency, this can be due to RNG, and other models just got unlucky seeds.
Cars
We didn’t notice anything particularly interesting. Let us know if there is something that we should pay attention to.
Etam Cru
Gorgeous! Not much to say here except that each model brings something unique and interesting to this generation. DreamShaper, epiCRealism, Counterfeit - all have range and uniqueness that we would love to have in the mix.
Playful Colorful Girl
This has been one of the great generations for V2, so we decided to compare it with other models.
Vodka stands out with diversity while still having consistently good outputs.
Biases of models is once again highlighted.
Despite those biases, epiCRealism generations look stunning.
Walking Student
Like the colorful girl, we think Vodka again stands out with diversity and quality (overall, the accuracy of following the prompts, the feel). epiCRealism once again has a stunning quality.
Mountain Landscape
In this generation, our model struggles with the blurriness from MidJourney source images. So we did two sets to see if and how models react to negative “blurry” prompts.
From afar, Vodka generations look cool, but when we examine closer, there is that painful blurriness that we have been observing since V1.
A few models had significantly better generations, NeverEndingDream stood out here, unlike other tests, and epiCRealism had good fidelity as usual.
We didn’t see much difference between generations with and without the negative prompts. There are some minor changes if examined closely.
Cute Robot
A lot of interesting generations. We think Vodka did pretty well here, as many models didn’t capture that cute part in the generations.
Sloth Downhill Skateboarding
We have yet to make a good generation of this one using a base model. Vodka has a few nice attempts, but none of them are great. Most models failed to make it sloth, and they are humans or human-like creatures. RPG and epiCRealism have a few interesting attempts.
Pillow, solo
In this generation, the main idea was to use the keyword “solo” in the prompts. This is one of the most popular booru tags using the WD14 tagger. As you can see, NeverEndingDream, DreamShaper, and Counterfeit were trained on the dataset tagged using this method. On one side, introducing this in the model can be good since many popular prompts have many such tags, potentially making the model react to it better. On the other hand, it is a radically different prompting approach to what we have in Vodka and can make the overall performance worse. We are leaning towards a small share of one of such models.
Remaining Generations
We won’t discuss the rest individually. The patterns and observations were quite similar to what we discussed above. And a few of the later prompts were somewhat inconclusive and need to be substituted with something more interesting.
Designing and Making the Mix
After reviewing all the images, we counted how often we marked different models as potentially a good addition to Vodka. epiCRealism and DreamShaper had the most mentions, and it was roughly a tie between them. There were some mentions of RPG and others, too, but we decided to keep it simple. So our intended recipe is as follows:
60% Vodka. 20% DreamShaper. 20% epiCRealism.
And we are calling this mix: Bloody Mary. (civitai) (Distillery)
Making the Mix
The easiest tool to use for this task we found was Automatic1111 WebUI. The problem is that you can only mix two models at a time. Or calculate the difference between the two models and add it to your model that doesn’t seem to be doing what we want here.
So to achieve the 60/20/20 mix, we did the two-step process.
We did a 75% Vodka mix with DreamShaper as step one.
And as step 2, we use 80% of the first mix and 20% epiCRealism, which should have diluted Vodka to 60% and DreamShaper to 20% as planned.
First Look at Bloody Mary
If you want to try it yourself, the easiest way is to use our Distillery Discord with daily free generations (link).
Alternatively, download the model and test it with your workflows from Civitai. (link)
Or take a look at our comparisons in this post. For that purpose, we decided to compare it to two other awesome models:
Juggernaut (link) has been popular recently, and the results are undoubtedly great.
And DreamShaper - V7 (link). A newer version of what we used in the mix and undoubtedly one of the strongest models out there. These two factors should make the comparisons interesting.
Some key observations based on the initial results:
Our first cocktail model looks very promising!
It defaults to a more artsy and stylized look over realistic which can be both good and bad.
Compared to the other two, it is less biased to default to generating a face regardless of the prompt and, in a way feels more flexible.
Overall it feels like we tested three very strong and interesting models.
Here are some images for you to compare: