Discussion about this post

User's avatar
Jack's avatar

After reading your three articles carefully, I have the following questions:

It seems that data cleaning and labeling are the most important factors affecting the model. Can we adjust the dataset in the following way:

When grabbing the results of Midjourney v5.1, only grab the Prompt and image that have been upscaled by users until there are more than 10,000 of them.

Analyze the Prompt to classify the images into categories such as indoor, outdoor, landscape, portrait, etc., and optimize the labels for each category accordingly. For example, if a user wants to generate an image in the style of Niji·Journey or a portrait style, then optimize the image labels based on these categories (because in portrait photography, most of the blurred lens is acceptable).

Consider classifying the data only for certain keywords in the Prompt during data cleaning, in order to train a LoRA, which is also a quick way to reproduce the Midjourney style. For example, if the Disney style Prompt is popular at a certain time, then directly collect data sets for this style keyword, which is also easier to clean. Finally, integrate different types of popular style data sets together to obtain the new generation of large-scale model Vodka.

I am not particularly knowledgeable, just expressing some unprofessional ideas. Thank you again for your efforts and almost completely open process in this work. It's great. I have added you as a Discord friend, but I think leaving a message here is more formal.

However, as English is not my native language, there may be some grammar issues, and I apologize for any misunderstandings this may cause.

Expand full comment
Chimko's avatar

Have you tested how your model reacts with Loras from civtit.ai? From my testing they're completely unusable with the vodka model, how does that factor into your modeling?

Expand full comment
8 more comments...

No posts