参考链接:
1. X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics, ACM Multimedia 2021.
2. ViDA-Man: Visual Dialog with Digital Human, ACM Multimedia 2021.
3. Unsupervised Person Image Generation with Semantic Parsing Transformation,CVPR 2019.
4. Unpaired Person Image Generation with Semantic Parsing Transformation, TPAMI 2020.
5. Down to the Last Detail: Virtual Try-on with Fine-grained Details. ACM MM 2020.
6. Boosting Image Captioning with Attributes, ICCV 2017.
7. Exploring Visual Relationship for Image Captioning, ECCV 2018.
8. X-Linear Attention Networks for Image Captioning, CVPR 2020.