Information Studies 277 -- Information Retrieval Systems: User-Centered Designs

Spring 2006

Course Web page:
http://polaris.gseis.ucla.edu/pagre/is277.html

Assignment.

Write a paper about XML-based semantic web technologies in a given domain. The domain could be an industry (e.g., finance or media), a profession (e.g., architecture or engineering), or an academic field (e.g., classics or geography). You should probably choose a domain that includes several types of textual documents or relational databases; semantic web technologies for non-textual documents such as maps, audio files, art works, and blueprints scarcely exist. The paper should be a Web page in XHTML format and should be about 4000 words. The papers will be publicly available and might be assigned as readings in future IS 277 classes.

A one-page proposal for your paper is due on your Web site during Week 5. The paper itself should be on the Web for grading by Thursday of finals week.

Your paper should be heavily hyperlinked to online resources relating to all of the aspects of your domain that involve ontology standards and the semantic web, including research projects that use semantic web technologies in the domain and documents that are marked up using the domain's semantic web standards. Some links, for example to journal articles, will only work for readers within a university or similar institution, and these links should be followed by "(requires subscription)" or something of the sort.

Include all of the types of ontologies -- document, metadata, domain, and service -- that exist (or could usefully be invented) in your domain.

Here are some things to consider.

(1) In some domains, existing ontologies and semantic web technologies will be numerous and advanced. In others, they will hardly exist. You may have to expand, reduce, or shift your domain in order to make your project manageable.

(2) What ontologies in your domain predate the semantic web? These might include classification systems, taxonomies, cataloguing standards, controlled vocabularies, or data models for relational or object-oriented databases. In what ways have these existing ontologies been standardized? Are they all compatible with one another? What are the prospects for translating these existing ontologies onto the semantic web? Or should they be replaced by new ontologies?

(3) What are the distinctive document genres in your domain? Examples of document genres include scientific articles, invoices, conference announcements, product catalogs, airline tickets, and corporate financial reports. The characteristic components and attributes of each document genre will probably correspond to rich document and metadata ontologies. The characteristic contents and uses of each genre will likewise probably correspond to domain and service ontologies. In many cases, these ontologies will not be formalized (that is, articulated, standardized, and written down) until it comes time to implement them on the semantic web. In other cases, the ontologies will already be implicit in existing domain-specific computer systems, but will not be codified in a computer-readable format such as XML. Should semantic Web technologies simply proceed by translating these existing ontologies into XML (or RDF or OWL), or should they be revised or even done over from scratch?

(4) How should ontology engineering for the semantic web proceed in your domain? For example, is your domain really so homogeneous that single standardized ontologies are feasible? Business transactions, for example, are extremely diverse in their structure, and each industry has its own practices. Some business ontologies may be universal, whereas others may be specific to a specific subdomain. Domain-independent "upper ontologies" for things like time and space may be useful as well. Some domains are already highly standardized, so that they are largely translating existing practices into a new technological vocabulary. Other domains are not standardized, so that use of the semantic web will require a major overhaul of existing practices. Will some of the ontologies in your domain be very large? What kind of work will it take to maintain them, modify them, extend them, and repair them?

(5) In many domains, for example business and the humanities, XML-based markup standards are already well advanced. Some are actually being used in daily professional practice. Yet most of these standards are actually obsolete, in the sense that they ought to be recast using OWL. So where should semantic web standardization in your domain be going, and what applications will full-blown use of OWL and OWL-S make possible? On the other hand, many markup standards, whether XML or OWL or whatever, will probably turn out to be more trouble than they are worth. After all, the markup is often cumbersome, and someone somewhere has to sit down and write it. In many cases, a markup standard is not useful unless hundreds or thousands of people all consistently use it, for example by all using a standardized software package that presupposes it. And Google and other statistical methods work well with no standardized document structures and almost no metadata. So which semantic web technologies in your domain will actually get used? By applying ideas from the course, you should be able to write about these matters in an analytical way.

(6) What might semantic web technologies actually be good for in your domain? What would they do, who would use them, and for what? What difference might information retrieval with highly structured documents and metadata make, compared to information retrieval using conventional methods? What is known about the uses that people make of information and information retrieval systems in your domain, and what parts of this knowledge will still apply if and when the domain has been remade through ontology standards and semantic web technologies? To analyze such things, you will have to close the gap between ontologies, which are very abstract, and the concrete work practices, real and potential, of the people in your domain. Assuming that ontology standards and semantic web technologies are actually made and applied in your domain, what institutional changes may result? That is, how will people in your industry, profession, or academic field do things differently when they are routinely using them? An ontological standard is an institutional fact as well as a technical one, and new technologies generally do not have important effects unless institutions change.