AI/etc 48

대규모 언어 모델은 게으른 학습자가 될 수 있습니다: 상황 내 학습의 지름길 분석

abs: https://arxiv.org/abs/2305.17256 Large Language Models Can be Lazy Learners: Analyze Shortcuts in In-Context Learning Large language models (LLMs) have recently shown great potential for in-context learning, where LLMs learn a new task simply by conditioning on a few input-label pairs (prompts). Despite their potential, our understanding of the factors influencing end-tas arxiv.org 1. 대규모 언..

AI/etc 2023.05.30

직접적인 선호 최적화: 당신의 언어 모델은 비밀리에 보상 모델입니다

abs: https://arxiv.org/abs/2305.18290 Direct Preference Optimization: Your Language Model is Secretly a Reward Model While large-scale unsupervised language models (LMs) learn broad world knowledge and some reasoning skills, achieving precise control of their behavior is difficult due to the completely unsupervised nature of their training. Existing methods for gaining s arxiv.org 설명: https://tw..

AI/etc 2023.05.30

사전 훈련된 언어 모델을 위한 플러그 앤 플레이 지식 주입

abs: https://arxiv.org/abs/2305.17691 Plug-and-Play Knowledge Injection for Pre-trained Language Models Injecting external knowledge can improve the performance of pre-trained language models (PLMs) on various downstream NLP tasks. However, massive retraining is required to deploy new knowledge injection methods or knowledge bases for downstream tasks. In th arxiv.org 1. 외부 지식을 주입하면 다양한 하위 스트림 N..

AI/etc 2023.05.30

Emergent Agentic Transformer와 Hindsight Experience

https://arxiv.org/abs/2305.16554 Emergent Agentic Transformer from Chain of Hindsight Experience Large transformer models powered by diverse data and model scale have dominated natural language modeling and computer vision and pushed the frontier of multiple AI areas. In reinforcement learning (RL), despite many efforts into transformer-based policies arxiv.org 1. 다양한 데이터와 모델 크기를 활용하는 대형 트랜스포머 모..

AI/etc 2023.05.30

DPOK: 텍스트-이미지 확산 모델 미세 조정을 위한 강화 학습

https://arxiv.org/abs/2305.16381 DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models Learning from human feedback has been shown to improve text-to-image models. These techniques first learn a reward function that captures what humans care about in the task and then improve the models based on the learned reward function. Even though relat arxiv.org 1. 인간의 피드백으로부터 학습하는 것이..

AI/etc 2023.05.30

OlaGPT: 인간과 같은 문제 해결 능력으로 LLM 강화

https://huggingface.co/papers/2305.16334 Paper page - OlaGPT: Empowering LLMs With Human-like Problem-Solving Abilities Yuanzhen Xie , Tao Xie , Mingxiong Lin , WenTao Wei , Chenglin Li , Beibei Kong , Lei Chen , Chengxiang Zhuo , Bo Hu , Zang Li ·published on May 23 huggingface.co 1. 대부분의 최근 연구에서 대형 언어 모델(LLM)은 특정 프롬프트의 지침에 따라 생각의 연결 고리를 생성함으로써 추론 작업을 수행할 수 있습니다. 그러나 복잡한 추론 문제를 해결하는 그들의 능력과 인간의..

AI/etc 2023.05.29