AI/etc

Panda-GPT: 하나의 모델에서 지침까지 모두 따르세요.

유로파물고기 2023. 5. 29. 10:30
반응형

https://panda-gpt.github.io/

 

PandaGPT

PandaGPT is a general-purpose instruction-following model that can both see and hear. Our pilot experiments show that PandaGPT can perform complex tasks such as detailed image description generation, writing stories inspired by videos, and answering questi

panda-gpt.github.io

https://arxiv.org/abs/2305.16355

 

PandaGPT: One Model To Instruction-Follow Them All

We present PandaGPT, an approach to emPower large lANguage moDels with visual and Auditory instruction-following capabilities. Our pilot experiments show that PandaGPT can perform complex tasks such as detailed image description generation, writing stories

arxiv.org

설명: https://twitter.com/yixuan_su/status/1661064018868551691?s=20

 

1. 우리는 PandaGPT라는 방법을 제시하는데, 이는 대형 언어 모델에 시각적이며 청각적인 지시사항 따르는 능력을 부

여하는 것입니다. 초기 실험 결과로서, PandaGPT는 자세한 이미지 설명 생성, 비디오에서 영감을 얻어 이야기를 쓰는 것, 오디오에 대한 질문에 대답하는 등 복잡한 작업을 수행할 수 있음을 보여주었습니다.

2. 더욱 흥미롭게도, PandaGPT는 다양한 형태의 입력을 동시에 받아들이고 이를 자연스럽게 결합할 수 있습니다. 예를 들어, PandaGPT는 이미지/비디오에서 객체가 어떻게 보이는지와 오디오에서 어떻게 들리는지를 연결할 수 있습니다. 이를 위해, PandaGPT는 ImageBind의 다중 모달 인코더와 Vicuna의 대형 언어 모델을 결합합니다.

3. PandaGPT의 훈련에는 정렬된 이미지-텍스트 쌍만이 필요합니다. ImageBind의 강력한 능력 덕분에, PandaGPT는 이미지와 텍스트 이외의 데이터(예: 비디오, 오디오, 깊이, 열, IMU)에 대해 새로운, 즉 제로샷, 크로스-모달 동작을 보여줍니다. 우리는 PandaGPT가 인간처럼 다양한 형태의 입력을 전체적으로 인식하고 이해하는 인공 일반 지능(AGI) 구축을 향한 첫걸음이 되기를 바랍니다.