After reading your three articles carefully, I have the following questions:
It seems that data cleaning and labeling are the most important factors affecting the model. Can we adjust the dataset in the following way:
When grabbing the results of Midjourney v5.1, only grab the Prompt and image that have been upscaled by users until there are more than 10,000 of them.
Analyze the Prompt to classify the images into categories such as indoor, outdoor, landscape, portrait, etc., and optimize the labels for each category accordingly. For example, if a user wants to generate an image in the style of Niji·Journey or a portrait style, then optimize the image labels based on these categories (because in portrait photography, most of the blurred lens is acceptable).
Consider classifying the data only for certain keywords in the Prompt during data cleaning, in order to train a LoRA, which is also a quick way to reproduce the Midjourney style. For example, if the Disney style Prompt is popular at a certain time, then directly collect data sets for this style keyword, which is also easier to clean. Finally, integrate different types of popular style data sets together to obtain the new generation of large-scale model Vodka.
I am not particularly knowledgeable, just expressing some unprofessional ideas. Thank you again for your efforts and almost completely open process in this work. It's great. I have added you as a Discord friend, but I think leaving a message here is more formal.
However, as English is not my native language, there may be some grammar issues, and I apologize for any misunderstandings this may cause.
Have you tested how your model reacts with Loras from civtit.ai? From my testing they're completely unusable with the vodka model, how does that factor into your modeling?
Which version of EveryDream2trainer are you using, I am using the latest version of EveryDream2trainer, I used your same parameter settings and data set, including the same steps of cleaning data, the loss and validation curves are not very consistent, mine generally will Over-fitting early may be a problem with the python environment, but I created two environments of torch 1.12 and torch 2.01 (the versions of some libraries are also different), the test results are the same, and they will be premature overfitting. May I ask what is the reason? Also, can you share the code of EveryDream2trainer you use? Thanks
Interesting, I was training V4 recently and noticed something similar. The main thing that has changed is that I updated ED2 to the latest which is built on PyTorch 2.0.
The first thing I noticed was the VRAM usage went down. And then the graph was showing much faster training.
My leading hypothesis right now is that optimizers are behaving differently with the latest update.
So what I did to get very close to my previous results was to update optimized to Adamw instead of adam8bit. This is much slower and uses much more vram but is almost identical to my previous run
Yes, I also tested the difference between the two. The speed of the latter and the occupancy of vram are greatly reduced. I trained 100 epochs on the rtx4090. Total training time took 1234.25 minutes, total steps: 114400, but with the same data and Parameters, the results of both I tested several times seem to be the same, and the performance will be improved a lot.
On the latest PyTorch 2 version of ED2? Because I got very different models between the two optimizers while every single thing was the same. I literally just changed the optimizer line
Not sure I follow this, can you elaborate?. One thing that for updated in ED2 is that instead of relative change in loss (difference from epoch to epoch) they started using absolute values of that given epoch. So it doesn’t start from 0 any more
After reading your three articles carefully, I have the following questions:
It seems that data cleaning and labeling are the most important factors affecting the model. Can we adjust the dataset in the following way:
When grabbing the results of Midjourney v5.1, only grab the Prompt and image that have been upscaled by users until there are more than 10,000 of them.
Analyze the Prompt to classify the images into categories such as indoor, outdoor, landscape, portrait, etc., and optimize the labels for each category accordingly. For example, if a user wants to generate an image in the style of Niji·Journey or a portrait style, then optimize the image labels based on these categories (because in portrait photography, most of the blurred lens is acceptable).
Consider classifying the data only for certain keywords in the Prompt during data cleaning, in order to train a LoRA, which is also a quick way to reproduce the Midjourney style. For example, if the Disney style Prompt is popular at a certain time, then directly collect data sets for this style keyword, which is also easier to clean. Finally, integrate different types of popular style data sets together to obtain the new generation of large-scale model Vodka.
I am not particularly knowledgeable, just expressing some unprofessional ideas. Thank you again for your efforts and almost completely open process in this work. It's great. I have added you as a Discord friend, but I think leaving a message here is more formal.
However, as English is not my native language, there may be some grammar issues, and I apologize for any misunderstandings this may cause.
Have you tested how your model reacts with Loras from civtit.ai? From my testing they're completely unusable with the vodka model, how does that factor into your modeling?
Hey, any chance you can tell me which Loras you tested? And what happened?
This is something I want to explore and write about, maybe even train Loras on top of vodka and see what happens
Which version of EveryDream2trainer are you using, I am using the latest version of EveryDream2trainer, I used your same parameter settings and data set, including the same steps of cleaning data, the loss and validation curves are not very consistent, mine generally will Over-fitting early may be a problem with the python environment, but I created two environments of torch 1.12 and torch 2.01 (the versions of some libraries are also different), the test results are the same, and they will be premature overfitting. May I ask what is the reason? Also, can you share the code of EveryDream2trainer you use? Thanks
Interesting, I was training V4 recently and noticed something similar. The main thing that has changed is that I updated ED2 to the latest which is built on PyTorch 2.0.
The first thing I noticed was the VRAM usage went down. And then the graph was showing much faster training.
My leading hypothesis right now is that optimizers are behaving differently with the latest update.
So what I did to get very close to my previous results was to update optimized to Adamw instead of adam8bit. This is much slower and uses much more vram but is almost identical to my previous run
Yes, I also tested the difference between the two. The speed of the latter and the occupancy of vram are greatly reduced. I trained 100 epochs on the rtx4090. Total training time took 1234.25 minutes, total steps: 114400, but with the same data and Parameters, the results of both I tested several times seem to be the same, and the performance will be improved a lot.
On the latest PyTorch 2 version of ED2? Because I got very different models between the two optimizers while every single thing was the same. I literally just changed the optimizer line
Also this added to my backlog to try and do learning rate parameters comparison across optimizers to have some conversion values
This problem seems to exist from Vodka V1 to Vodka V3. The loss and verification loss are very different from your test results.
Not sure I follow this, can you elaborate?. One thing that for updated in ED2 is that instead of relative change in loss (difference from epoch to epoch) they started using absolute values of that given epoch. So it doesn’t start from 0 any more