Technologies for Information Management (WXBG 6105): Text Retrieval System

A text retrieval system (TRS) is an automated system that manages the storage and subsequent retrieval of structured non-numerical databases. TRSs therefore handle databases which comprise mainly of text. It is not a word processing package, as it is not meant for large chunks of free flowing text (as exemplified by an article). The way text are entered into the database are more structured in nature. Chunks of text are broken down into identifiable data elements. These data elements are entered into database through predetermine fields and subfields and a number of these encompass a record in a database.

Usually text information which needed a text retrieval system have common identifiable pattern which characterized each record in the database. For example, a record in a bibliographic database would usually have the following elements: author(s), title, imprint (place of publication, publisher, year), or the source (journal title if it is an article) and information about the physical nature of the document (volume, pagination etc.).

However, text retrieval systems can handle a variety of databases other than bibliographic information such as product information, company information, personnel information, expertise databases, indexes of all sorts, directories of society’s membership, databases on criminals, biographical databases etc.

The major features of the CDS/ISIS software are:

The handling of variable length records, fields and sub fields, thus saving disk space and making it possible to store greater amounts of information;
The handling of repeatable fields;
A data base definition component allowing the user to define the data to be processed for a particular application;
A data entry component for entering and modifying data through user-created data base specific worksheets;
An information retrieval component using a powerful search language providing for field-level and proximity search operators, in addition to the traditional and/or/not operators, as well as free-text searching;
A powerful sort and report generation facility allowing the user to easily create any desired printed products, such as catalogues, indexes, directories, etc.;
A data interchange function based on the ISO 2709 international standard used by leading data base producers;
An integrated application programming language (CDS/ISIS Pascal and the ISIS_DLL), allowing the user to tailor the software to specific needs;
Functions allowing the user to build relational data bases, though CDS/ISIS is not based over a relational model;
Powerful hypertext functions allow designing complex user interfaces.
A Windows interface between CDS/ISIS and IDAMS, the UNESCO software for statistical analysis, has also been developed.
From the outset, CDS/ISIS was created as multi-lingual software, providing integrated facilities for the development of local linguistic versions. Thus, although UNESCO distributes only the English, French and Spanish versions of the package, user-developed versions exist in virtually all languages, including special versions which UNESCO helped in developing, for Arabic, Chinese and Korean.

Technologies for Information Management (WXBG 6105)

Tuesday, February 15, 2011

Text Retrieval System

No comments:

Post a Comment