Search
Duplicate

Retrieval Model Study

Created
5/11/2021, 9:23:00 AM
Tags
Empty

Retrieval Model

Retrieve-then-rank ๋ชจ๋ธ

Cross Encoder: Ranker

Query์™€ Candidate์˜ ๋ชจ๋“  ์กฐํ•ฉ์— ๋Œ€ํ•ด์„œ ๊ณ„์‚ฐ์„ ํ•ด์•ผํ•ด์„œ, candidate์ด ๋งŽ์„ ๋•Œ๋Š” ์†๋„๊ฐ€ ๋„ˆ๋ฌด ๋Š๋ ค์„œ ์“ธ์ˆ˜๊ฐ€ ์—†์Œ. ์ „์ฒด candidate ์ค‘์—์„œ ํ•œ 100๊ฐœ๋‚˜ 10๊ฐœ์ •๋„๋งŒ retriever๋กœ ์„ ๋ณ„์„ ํ•˜๊ณ , ์ด ์ค‘์—์„œ ๊ฐ€์žฅ ์ตœ๊ณ ์˜ ์ •๋‹ต์„ ranker๊ฐ€ ์„ ํƒํ•˜๋„๋ก ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ํ™œ์šฉํ•ด์•ผํ•จ.

Poly Encoder: Fast Retriever

Poly Encoder๋Š” query๋ž‘ candidate์—์„œ ๊ฐ๊ฐ feature๋ฅผ ๋ฝ‘๊ณ , similarity ๊ณ„์‚ฐํ•˜๋Š” ๋ถ€๋ถ„๋งŒ ์กฐ๊ธˆ ๋ณต์žกํ•œ attention ์•„ํ‚คํ…์ณ๋ฅผ ์“ฐ๊ธฐ ๋•Œ๋ฌธ์— ์—ฐ์‚ฐ์†๋„๊ฐ€ ๋น ๋ฅธ ํŽธ์ž„

Bi-Encoder / ColBERT: Super Fast Retriever

์ด๋…€์„๋“ค๋„ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ Q๋ž‘ C ๊ฐ๊ฐ์˜ feature๋“ค์„ ๋ฏธ๋ฆฌ ๋ฝ‘์•„ ๋†“๊ณ  similarity ๊ณ„์‚ฐํ•˜๋Š” ๋ฐ, ์ด ๋ถ€๋ถ„์ด ์‹ฌ์ง€์–ด ์—„์ฒญ๋‚˜๊ฒŒ ์†๋„๊ฐ€ ๋น ๋ฅธ MIPS(Maximum Inner Products Search)๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— Poly Encoder๋ณด๋‹ค๋„ ๋” ๋น ๋ฅด๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜๊ฐ€ ์žˆ์Œ. ๋‹ค๋งŒ ColBERT๋Š” Bi-Encoder๋ณด๋‹ค ์ข€ ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋‚ด๊ธฐ ์œ„ํ•œ token-wise MaxSim๊ฐ’๋“ค์˜ ํ•ฉ์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค๋Š” ๊ฒŒ ์ฐจ์ด์ž„.

Retriever Model์˜ ์„ฑ๋Šฅ ํ‰๊ฐ€

Test ๋ฐ์ดํ„ฐ๋ฅผ ํ™•๋ณดํ•˜๊ณ  ์žˆ๋‹คํ•ด๋„, GT sentence๋ณด๋‹ค ๋” ์ ์ ˆํ•œ candidate์ด ์กด์žฌํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— Top 1 Accuracy๋ฅผ ๊ฐ€์ง€๊ณ ๋Š” ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๊ธฐ๊ฐ€ ๋ถ€์ ์ ˆํ•˜๋‹ค.

Hits@1of10

๋”ฐ๋ผ์„œ 1๊ฐœ์˜ GT sentence์™€ ๋žœ๋คํ•˜๊ฒŒ ๋ฝ‘์€ 9๊ฐœ์˜ sentence๋ฅผ ์„ž์–ด์„œ batch๋ฅผ ๊ตฌ์„ฑํ•˜๊ณ , ์ด ์ค‘์—์„œ ์ •๋‹ต์„ ๋งž์ถœ ํ™•๋ฅ ์„ ์ธก์ •ํ•˜๋Š” Hits@1of10 ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•ด์„œ ๊ฐ„์ ‘์ ์œผ๋กœ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ์ธก์ •ํ•ด์•ผํ•œ๋‹ค.
TOP