Now Available for True Supporters and I'm All In subscribers
How it works?
We've fine-tuned a custom AI model for each of the bots that are available for this new feature (over 50,000 public bots). This fine-tuning have allowed the AI to learn how to generate similar images, however because this is done on only 1 image, it might also attempt to replicate the style of the image or other details of the image.
When you request for an image to be generated within a conversation, we query a Large Language Model with the definition of the bot and the last turn of your conversation and ask the AI to build an image generation prompt. We then query a diffusion model to generate an image using that image prompt.
The idea of being able to visualize your conversation is quite exciting, however it's important to properly set expectations by understanding the current limitations:
Model Training Limitations
Because we have trained the model on only one image, the generated image might look very different from your character. This is especially true when the reference image is poor quality or show your character in a strange angle or with strange colors.
Large Language Model Limitation
We are not using any specialized LLM model to generate our image prompt, which means that the generated prompt might not always include important elements about your bot physical attributes or might not even consider the context of your conversation in the generated prompt.
Diffusion Model Limitation
We are using the same diffusion model then our Avatar Generation tool, which is DreamShaper-8. That model has a bias toward generating images of Caucasian females because of how it was trained. So it might turn man into woman or not respect skin color tones.
How to improve results
The feature will work better for characters that have some physical attributes defined in the character definition (hair color, eye color, etc).
Making sure it has the Female or Male tag defined will also help to avoid confusion on gender.
Training Private Bots
The I'm All In tier now allows you to train your private chatbot. The training process can take up to a few hours.
The current limitations also mean there's a lot of room for improvements. While our focus will remain primarily on improving our text generation, we might introduce new features such as
Ability to add more than 1 image to a bot to perform better model training
Ability to customize the generated mage generation prompt
Adding a field in the Bot definition to define physical attributes that would normally not be necessary when chatting.
And we're also planning to use feedback data to eventually train a custom LLM and diffusion model to improve image generation.