AI 89

Block-State Transformer

https://arxiv.org/abs/2306.09539 Block-State Transformer State space models (SSMs) have shown impressive results on tasks that require modeling long-range dependencies and efficiently scale to long sequences owing to their subquadratic runtime complexity. Originally designed for continuous signals, SSMs have sho arxiv.org 1. 상태 공간 모델(SSM)은 장기적인 의존성을 모델링하는 데 필요한 작업에서 뛰어난 결과를 보여주며, 서브쿼드럴릭 실행 시간 복잡..

AI/Google&DeepMind 2023.06.19

역 스케일링: 클수록 좋지 않은 경우

https://arxiv.org/abs/2306.09479 Inverse Scaling: When Bigger Isn't Better Work on scaling laws has found that large language models (LMs) show predictable improvements to overall loss with increased scale (model size, training data, and compute). Here, we present evidence for the claim that LMs may show inverse scaling, or worse arxiv.org 1. 대형 언어 모델(LM)에 대한 연구에서는 모델 크기, 훈련 데이터, 계산량 등이 증가함에 따라 ..

AI/etc 2023.06.19

언어 모델이 약한 에이전트를 가르칠 수 있습니까? 마음의 이론을 통해 학생들을 향상시키는 교사 설명

https://arxiv.org/abs/2306.09299 Can Language Models Teach Weaker Agents? Teacher Explanations Improve Students via Theory of Mind Large Language Models (LLMs) perform complex reasoning by generating explanations for their predictions. However, a complementary goal of explanations is to also communicate useful knowledge that improves weaker agents. Hence, we investigate whether LLMs a arxiv.org ..

AI/etc 2023.06.18

이미지 캡셔너는 확장 가능한 비전 학습자이기도 합니다.

https://arxiv.org/abs/2306.07915 Image Captioners Are Scalable Vision Learners Too Contrastive pretraining on image-text pairs from the web is one of the most popular large-scale pretraining strategies for vision backbones, especially in the context of large multimodal models. At the same time, image captioning on this type of data is co arxiv.org 1. 웹에서 이미지-텍스트 쌍에 대한 대조적 사전학습은, 특히 대형 다중모달 모델의 맥..

AI/Google&DeepMind 2023.06.14

언어 모델을 사용한 인증된 추론

https://arxiv.org/abs/2306.04031 Certified Reasoning with Language Models Language models often achieve higher accuracy when reasoning step-by-step in complex tasks. However, their reasoning can be unsound, inconsistent, or rely on undesirable prior assumptions. To tackle these issues, we introduce a class of tools for language arxiv.org 1. 언어 모델은 복잡한 작업에서 단계별로 추론을 할 때 종종 더 높은 정확도를 보입니다. 그러나 그들의..

AI/etc 2023.06.08

딥마인드 AlphaDev: 새로운 접근법으로 더 빠른 정렬 알고리즘 발견

https://www.deepmind.com/blog/alphadev-discovers-faster-sorting-algorithms AlphaDev discovers faster sorting algorithms In our paper published today in Nature, we introduce AlphaDev, an artificial intelligence (AI) system that uses reinforcement learning to discover enhanced computer science algorithms – surpassing those honed by scientists and engineers over decades. www.deepmind.com DeepMind의 ..

AI/Google&DeepMind 2023.06.08