The texture generation network takes an untextured 3D mesh $(\textbf{M})$, a reference texture image $(\mathbf{I_{tex}})$, and a descriptive text prompt as input to generate a textured view of the mesh as output.
Apart from text, we condition the generation process with edges describing the mesh, through ControlNet, and the input texture image, through IP-Adapter. Edge conditioning allows us to respect the "identity" of the mesh better, than depth or normals, and IP-Adapter allows us to use a single image as prompt without any additional training or optimization.
We also introduce Image Inversion, a novel technique to quickly personalize the diffusion model for a single concept using a single image. It is an optional step and we use it for cases where the pre-trained IP-Adapter falls short in capturing all the details from the input texture image faithfully. It involves fine-tuning parts of our network: Stable Diffusion's U-Net and IP-Adapter's Projection Network (indicated by 🔥), for a few iterations using a single image $\mathbf{I_{tex}}$.
Check out the paper to learn more. 🙂
@article{perla2024easitex,
title={EASI-Tex: Edge-Aware Mesh Texturing from Single Image},
author = {Perla, Sai Raj Kishore and Wang, Yizhi and Mahdavi-Amiri, Ali and Zhang, Hao},
journal = {ACM Transactions on Graphics (Proceedings of SIGGRAPH)},
publisher = {ACM New York, NY, USA},
year = {2024},
volume = {43},
number = {4},
articleno = {40},
doi = {10.1145/3658222},
url = {https://github.com/sairajk/easi-tex},
}