Consecutive Decoding for Speech-to-text Translation
Qianqian Dong, Mingxuan Wang, Hao Zhou, Shuang Xu, Bo Xu, Lei Li
AAAI, 2021
Abstract
Speech-to-text translation (ST), which directly translates the source language speech to the target language text, has attracted intensive attention recently. However, the combination of speech recognition and machine translation in a single model poses a heavy burden on the direct cross-modal crosslingual mapping. To reduce the learning difficulty, we propose COnSecutive Transcription and Translation (COSTT), an integral framework for speech-to-text translation. Our method is verified on three mainstream datasets, including Augmented LibriSpeech English-French dataset, TED English-German dataset, and TED English-Chinese dataset. Experiments show that our proposed COSTT outperforms the previous state-of-the-art methods. Our code is available at https://github.com/dqqcasia/neurst.
Please cite as:
@article{dong2021consecutive,
title={Consecutive Decoding for Speech-to-text Translation},
author={Dong, Qianqian and Wang, Mingxuan and Zhou, Hao and Xu, Shuang and Xu, Bo and Li, Lei},
year={2021}
}