Efficient Speech Translation with Dynamic Latent Perceivers

Jun 4, 2023·

Ioannis Tsiamas

Gerard I. Gállego

José A. R. Fonollosa

Marta R. Costa-Jussà

· 0 min read

PDF Source Document Code DOI

Abstract

Transformers have been the dominant architecture for Speech Translation in recent years, achieving significant improvements in translation quality. Since speech signals are longer than their textual counterparts, and due to the quadratic complexity of the Transformer, a down-sampling step is essential for its adoption in Speech Translation. Instead, in this research, we propose to ease the complexity by using a Perceiver encoder to map the speech inputs to a fixed-length latent representation. Furthermore, we introduce a novel way of training Perceivers, with Dynamic Latent Access (DLA), unlocking larger latent spaces without any additional computational overhead. Speech-to-Text Perceivers with DLA can match the performance of Transformer baselines across three language pairs in MuST-C. Finally, a DLA-trained model is easily adaptable to DLA at inference, and can be flexibly deployed with various computational budgets, without significant drops in translation quality.

Type

Conference paper

Publication

In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Last updated on Jun 4, 2023

Speech Translation Transformer Perceiver Efficient NLP Dynamic Latent Access

Authors

Ioannis Tsiamas

← Speech Translation with Foundation Models and Optimal Transport: UPC at IWSLT23 Jul 1, 2023

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation Sep 18, 2022 →