โ† Back

Agent-as-Judge for Factual Summarization of Long Narratives

Multi-Agent arxiv arXiv:2501.09993 PDF โ†—
summarizationfactualjudgelongnarrativesagentnarrativesmetricsdemonstrated
Large Language Models (LLMs) have demonstrated near-human performance in summarization tasks based on traditional metrics such as ROUGE and BERTScore. However, these metrics do not adequately capture critical aspects of summarization quality, such as factual accuracy, particularly for long narratives (>100K tokens). Recent advances, such as LLM-
5~10๋ถ„. ์ œ๋ชฉโ†’์ดˆ๋กโ†’์ธํŠธ๋กœโ†’์„น์…˜ํ—ค๋”โ†’๊ทธ๋ฆผโ†’๊ฒฐ๋ก ๋งŒ.
ํŒ๋‹จ: ์–ด๋–ค ๋ฌธ์ œ๋ฅผ ํ’€๊ณ  / ํ•ต์‹ฌ ์•„์ด๋””์–ด / ๋‚ด ์ž‘์—…๊ณผ ๊ด€๋ จ ์žˆ๋‚˜?
~1์‹œ๊ฐ„. ๊ทธ๋ฆผยทํ‘œ๋ฅผ ๊ผผ๊ผผํžˆ. ์ฆ๋ช…ยท์ˆ˜์‹ ๋””ํ…Œ์ผ์€ ๊ฑด๋„ˆ๋œ€.
์‚ฐ์ถœ๋ฌผ: "์ด๋“ค์ด ๋ญ˜ ํ–ˆ๊ณ  ์™œ ๊ทธ๊ฒŒ ํ†ตํ•˜๋Š”๊ฐ€" ํ•œ ๋ฌธ๋‹จ.
์žฌํ˜„ํ•˜๋“ฏ ์ฝ๊ธฐ. ๊ฐ€์ •์„ ์˜์‹ฌ. ์ง์ ‘ ์ธ์šฉ/๋ฐ˜๋ฐ•ํ•  ๋…ผ๋ฌธ๋งŒ.
๋ Œ์ฆˆ: "๋‚ด ํ”Œ๋ฆฟ์—์„œ ์ธก์ •ํ•˜๋ฉด ์ €์ž๊ฐ€ ๋ชป ํ•œ ๋ฌด์—‡์„ ๋ณด์—ฌ์ค„ ์ˆ˜ ์žˆ๋‚˜?"
View in Knowledge Graph โ†’