AI/etc

긴 컨텍스트 대형 모델을 위한 블록별 병렬 트랜스포머

유로파물고기 2023. 6. 1. 09:43
반응형

abs: https://arxiv.org/abs/2305.19370

 

Blockwise Parallel Transformer for Long Context Large Models

Transformers have emerged as the cornerstone of state-of-the-art natural language processing models, showcasing exceptional performance across a wide range of AI applications. However, the memory demands posed by the self-attention mechanism and the large

arxiv.org

1. 트랜스포머 모델은 다양한 AI 응용 분야에서 최첨단 자연어 처리 모델로 두각을 나타내지만, 트랜스포머의 셀프 어텐션 메커니즘과 큰 피드포워드 네트워크는 메모리 요구량이 크므로, 긴 시퀀스 처리에 한계가 있다.

2. 이러한 문제를 해결하기 위해 우리는 Blockwise Parallel Transformer (BPT)라는 새로운 접근법을 제시한다. BPT는 셀프 어텐션 계산과 피드포워드 네트워크 융합을 블록 단위로 처리하여 메모리 비용을 최소화한다.

3. 긴 입력 시퀀스를 처리하면서 메모리 효율성을 유지하는 BPT는 기존 트랜스포머보다 최대 32배, 이전 메모리 효율 방법보다 2~4배 더 긴 시퀀스를 학습할 수 있다. 언어 모델링 및 강화 학습 작업에 대한 광범위한 실험은 BPT의 메모리 요구량을 줄이고 성능을 향상시키는 효과를 입증하였다.