Imate-Text Matching

CCMB: A Large-scale Chinese Cross-modal Benchmark
Vision-language pre-training (VLP) on large-scale datasets has shown premier performance on various downstream tasks. In contrast to …