Text-to-image generation has progressed significantly in recent years, yet integrating text seamlessly into generated images remains a challenge. This study introduces PosterAI, a model designed to generate high-quality images with accurately placed, legible text. The purpose of this study is to show that existing AI technologies can be combined to efficiently produce visually coherent images with embedded text for applications such as book covers, movie or theater posters, or advertisements.
PosterAI is trained using a custom data set of over 250 book covers. It can also take another image as an input. Image generation is done in a two step process. In the first stage, the model processes user-inputted images through a GLM-4V-9B model to generate detailed text prompts. Layout generation then predicts bounding boxes and character-level segmentation masks to determine text position, size, and alignment. In the second stage, a latent diffusion model synthesizes the image by progressively refining noisy latent representations, guided by text-aware denoising techniques and character-aware loss.
Results show that PosterAI can generate images with text that is sharp, legible, and contextually integrated. The model successfully aligns text within predefined regions, ensuring coherence with visual elements and maintaining text clarity throughout the denoising process. The use of OCR-based segmentation masks improved text placement accuracy, and the character-aware loss function enhanced legibility.
In conclusion, PosterAI offers a robust solution for generating high-quality, text-embedded images. By combining multiple technologies, this innovation addresses existing limitations and supports efficient, budget-friendly image design for self-publishing authors and marketing applications.
See full research paper here.
We use cookies to analyze website traffic and optimize your website experience. By accepting our use of cookies, your data will be aggregated with all other user data.