Self-supervised Learning

Prototypical Contrastive Language Image Pretraining

In this paper, we show a representation grouping effect during this process: the InfoNCE objective indirectly groups semantically similar representations together via randomly emerged within-modal anchors. We introduce Prototypical Contrastive Language Image Pretraining (ProtoCLIP) to enhance such grouping by boosting its efficiency and increasing its robustness against modality gap.

Delong Chen, Zhao Wu, Fan Liu, Zaiquan Yang, Yixiang Huang, Yiping Bao, Erjin Zhou

Prototypical Contrastive Language Image Pretraining

Self-Supervised Music Motion Synchronization Learning for Music-Driven Conducting Motion Generation

This paper proposed the First deep-learning based music-driven conducting motion generation method, and presented a large-scale music motion dataset ConductorMotion100 with unprecedented 100 hours length.

Fan Liu, Delong Chen, Ruizhi Zhou, Sai Yang, Feng Xu

VirtualConductor: Music-driven Conducting Video Generation System

In this demo, we present the VirtualConductor, a system that can generate conducting video from a given piece of music and a single user’s image. This demo won the ICME 2021 Best Demo award.

Delong Chen, Fan Liu, Zewen Li, Feng Xu

VirtualConductor: Music-driven Conducting Video Generation System