Meta-learning without data via Wasserstein distributionally-robust model fusion


Existing meta-learning works assume that each task has available training and testing data. However, there are many available pre-trained models without accessing their training data in practice. We often need a single model to solve different tasks simultaneously as this is much more convenient to deploy the models. Our work aims to meta-learn a model initialization from these pre-trained models without using corresponding training data. We name this challenging problem setting as Data-Free Learning To Learn (DFL2L). We propose a distributionally robust optimization (DRO) framework to learn a black-box model to fuse and compress all the pre-trained models into a single network to address this problem. To encourage good generalization to the unseen new tasks, the proposed DRO framework diversifies the learned task embedding associated with each pre-trained model to cover the diversity in the underlying training task distributions. A model initialization is sampled from the black-box network during meta-testing as the meta learned initialization. Extensive experiments on offline and online DFL2L settings and several real image datasets demonstrate the effectiveness of the proposed methods.

In Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence
Kaiqiang Song
Kaiqiang Song
Senior Research Scientist

Kaiqiang Song (宋凯强) is a Senior Research Scientist at Tencent AI Lab, Seattle, specializing in Natural Language Processing. His research focuses on advancing artificial intelligence through machine learning, NLP, and large language models. He is dedicated to optimizing AI model architectures for practical applications like text summarization and text generation, bridging the gap between foundational AI research and real-world impact.