Article Summary for Lecture #9 -Mai

In the article “Analysis in indexing: document and domain centered approaches,” Mai compares and contrasts document-centered approaches and domain-centered approaches. Different actions are required for the domain-centered approach to indexing, but the authors suggests that the advantages to using this approach offsets the effort needed.

Indexing breaks down subject matter into single terms to make retrieval easier. Typically, the indexer analyzes the document for subject matter, then translates the subject matter into index terms. A problem with indexing in general is not completing the first step of indexing: assigning subject headings to the document, from which is translated the index terms. Some say to simply find the main topic of the paper by examining parts of the document (i.e. table of contents, introduction, chapter headings), assuming the indexer can determine the subject matter from document attributes. Others point to controlled vocabulary to choose the best pre-existing term that best matches the document.

Four approaches to indexing are mentioned in this article. The document-oriented approaches to indexing analyzes the document attributes to determine the subject heading and indexing terms. Document-centered approach has the indexer select terms other than what the author has given that may be used in user searches, or putting into context for the user. Essentially, the indexer would guess at what the reader’s choice of search terms to be used. A third approach is the domain-analytic approach, or domain analysis. Context is defined by the activities surrounding a term, and those surroundings would be its domain. A study of this domain would give insights to the indexers of the subject heading to assign. The fourth approach is domain-centered approach, where the subject matter is determined strictly by an understanding of the domain.

The author seems to be in favor of the domain-centered approach to indexing by downplaying finding the subject heading in the document and upplaying that the indexer will know what every user will search for in any given document based on the general topical area that the author is writing on. This is pure guesswork on the librarian’s part and makes librarians into the content experts, we are only experts in organization, not the content. Librarians are very knowledgeable, but we do not know all terms for all things that a user would search for, thus we should stick with what the content-expert, or the author, thinks is the subject heading. Everyone is different and what we would guess as someone’s search terms would not be correct for all users. Also, it is highly unlikely that the subject heading cannot be determined by document attribute in the majority of documents. Analyzing the domain, and reading all attributes to make a guess is also much more time consuming than simply reading what the author has written on the subject and making the author’s words as the subject heading. I personally like comparing what I think is the subject heading to controlled vocabulary sets.