AI/Google&DeepMind

Block-State Transformer

유로파물고기 2023. 6. 19. 10:00
반응형

https://arxiv.org/abs/2306.09539

 

Block-State Transformer

State space models (SSMs) have shown impressive results on tasks that require modeling long-range dependencies and efficiently scale to long sequences owing to their subquadratic runtime complexity. Originally designed for continuous signals, SSMs have sho

arxiv.org

1. 상태 공간 모델(SSM)은 장기적인 의존성을 모델링하는 데 필요한 작업에서 뛰어난 결과를 보여주며, 서브쿼드럴릭 실행 시간 복잡성 덕분에 긴 시퀀스로 효율적으로 확장됩니다. 원래 연속적인 신호를 위해 설계된 SSM은 비전과 오디오 작업에서 탁월한 성능을 보여주었지만, 언어 모델링 작업에서는 여전히 트랜스포머 성능이 뒤떨어집니다.

2. 이 연구에서는 장기 범위 맥락화를 위한 SSM 부분층과 단기 시퀀스 표현을 위한 블록 트랜스포머 부분층을 내부적으로 결합하는 하이브리드 층인 블록-상태 트랜스포머(BST)를 제안합니다. SSM과 블록 방식의 주의를 통합하는 세 가지 다른, 완전히 병렬화 가능한 변형을 연구합니다.

3. 우리의 모델은 언어 모델링의 어려움에 대한 유사한 트랜스포머 기반 구조를 능가하며, 긴 시퀀스로 일반화합니다. 또한, 블록-상태 트랜스포머는 모델 병렬화가 적용될 때 층 수준에서 블록-순환 트랜스포머에 비해 속도가 10배 이상 빨라집니다.