정보검색 용어 정리

corpus(말뭉치) - 언어 연구를 위해 텍스트를 컴퓨터가 읽을 수 있는 형태로 모아놓은 언어 자료

corpora - corpus의 복수형

ranked retrieval - 순위를 매기는 검색

relevant - 검색 조건에 만족하는

non relevant - 검색 조건에 만족하지 않는

query = a set of kewords

term - 정보검색에서 검색의 단위가 되는 단어

collection : a set of document. 다큐멘트의 집합

어느 순간에는 document수가 static하다고 가정한다.

Goal : retrieve document(검색에 대한 답을 찾는게 아니다.) with information that is relevant to user's information need and helps that user complete a task.

indexing(색인) - document마다 어떤 term이 나타나는지 저장하는 것.

precision - 원하는 정보를 찾은 정도

recall - 찾아야 하는 정보중 얼마나 많이 찾았는가?

정보검색 발전 과정 ->순으로

Database

Information Retrieval

Question Answering System

Dialog System

SQL

text

knowledge

dialog

structured data

(정보가 표현하는게 명확)

unstructed data

(나타내는것이 불분명)

(subject, property,

object)

문자메일의 블로그