AI/Google&DeepMind

AudioPaLM: 말하고 들을 수 있는 대형 언어 모델

유로파물고기 2023. 6. 23. 10:52
반응형

https://arxiv.org/abs/2306.12925

 

AudioPaLM: A Large Language Model That Can Speak and Listen

We introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that c

arxiv.org

https://google-research.github.io/seanet/audiopalm/examples/

 

AudioPaLM

AudioPaLM A Large Language Model That Can Speak and Listen |paper| Paul Rubenstein*, Chulayuth Asawaroengchai*, Duc Dung Nguyen*, Ankur Bapna, Zalán Borsos, Félix de Chaumont Quitry, Peter Chen, Dalia El Badawy, Wei Han, Eugene Kharitonov,

google-research.github.io

 

1. 이 논문에서는 음성 이해와 생성을 위한 대형 언어 모델인 'AudioPaLM'을 소개합니다. AudioPaLM은 텍스트 기반 언어 모델인 'PaLM-2'와 음성 기반 언어 모델인 'AudioLM'을 결합한 통합 다중 모달 아키텍처로, 음성 인식 및 음성-음성 번역 등의 응용 분야에서 텍스트와 음성을 처리하고 생성할 수 있습니다.

2. AudioPaLM은 AudioLM에서 말하는 사람의 신원과 억양과 같은 음성언어 정보를 보존하는 능력과 PaLM-2와 같은 텍스트 대형 언어 모델에서만 존재하는 언어학적 지식을 상속받습니다. 텍스트 전용 대형 언어 모델의 가중치로 AudioPaLM을 초기화하는 것이 음성 처리를 향상시킴을 보여주며, 이를 통해 사전 학습에 사용된 텍스트 학습 데이터의 더 큰 양을 음성 작업에 도움이 되도록 활용할 수 있음을 성공적으로 보여줍니다.

3. 결과적으로 이 모델은 음성 번역 작업에 대해 기존 시스템을 크게 웃돕니다. 또한 학습 중에 입력/목표 언어 조합이 보이지 않은 많은 언어에 대해 제로샷 음성-텍스트 번역을 수행할 수 있는 능력을 가지고 있습니다. AudioPaLM은 또한 짧은 음성 프롬프트를 기반으로 언어 간에 음성을 전송하는 것과 같은 오디오 언어 모델의 특성을 보여줍니다.