Information Studies 277 -- Information Retrieval Systems: User-Centered Designs

Phil Agre
Office: 229 GSE&IS Building
Phone: (310) 825-7154
Email: pagre@ucla.edu
Home: http://polaris.gseis.ucla.edu/pagre/

Fall 2007
Wednesdays from 9am to 12:30pm, GSE&IS room 245

DRAFT

This is a course on the semantic web, an important new document-centered computing technology in which web pages and other online resources are provided with metadata that can be automatically processed by computers.

The course prerequisites are IS 245 (Information Access) and IS 260 (Information Structures). The course complements several other courses in the program, including IS 240 (Management of Digital Records), IS 270 (Introduction to Information Technology), IS 272 (Human/Computer Interaction), IS 274 (Database Management Systems), IS 276 (Information Retrieval Systems: Structures and Algorithms), and IS 464 (Metadata).

The main idea of the semantic web is machine-readable ontology standards. An "ontology" is a theory of the categories of things that comprise a given domain. Familiar examples of ontologies include taxonomies and controlled vocabularies. Every computer system uses an ontology. Computer systems that are built independently of one another, however, often cannot interoperate because they use different ontologies. Now that computer systems are heavily networked, numerous user groups have begun standardizing their ontologies. The semantic web consists of mechanisms for "marking up" ontologies and then processing them.

There are roughly four kinds of ontologies: document ontologies (e.g., the chapters of a book or the footnotes of a paper), metadata ontologies (e.g., the format of a file or the copyright status of a document), domain ontologies (e.g., the components of an automobile or the entries of a schedule), and service ontologies (e.g., the inputs of a software module, the steps of a transaction, or the formats of messages that are passed back and forth between a client and server). Because its topic is information retrieval and not web services generally, this course is mainly about document and metadata ontologies. In practice, however, document and metadata ontologies often include elements of domain ontologies, and many services use documents and metadata. For completeness, therefore, weeks 9 and 10 respectively will be on domain and service ontologies.

The semantic web includes a layered series of markup languages starting with XML (itself derived from SGML). The most distinctive aspect of XML is that user groups can use it to define specialized sublanguages to mark up the ontologies that are meaningful for their own work. "User-centered design" in a semantic web context means precisely the codification of a user group's ontologies. This is important because document collections that have been marked up within a standard XML-based language can be stored and retrieved in much more sophisticated ways than unstructured plain text documents, or documents whose structures have not been marked up in a standardized way. Although it is too soon to be certain, the result may be a revolution in the technology of information retrieval. In the past, this course has generally applied ideas from user interface design to more traditional information retrieval technologies. Here, for example, is the IS 277 syllabus from Winter 2002 (in Word format):

http://polaris.gseis.ucla.edu/pagre/is277-winter-2002.doc

Unfortunately, serious user interfaces for semantic web technologies hardly exist. Indeed, it is not even clear what semantic web technologies would do. Nor has much research been done on the uses in practice of large collections of structured documents that use the four kinds of machine-readable ontologies. This is truly an opportunity to work on the ground floor of an important new field. We will analyze how these new technologies apply to the reinvention of information retrieval in particular domains. And, both in class and in the course assignment, we will attempt to anticipate what kinds of interfaces will best integrate the new technologies into the work practices of those domains.

Although the semantic web is highly technical in nature, this course does not presuppose any technical background beyond that of the program in general and the course prerequisites. In general, the course will emphasize ontology markup more than the programs that use it, ontologies for documents more than for services, reading the marked-up documents more than writing them, and real examples of semantic web documents more than synthetic examples. Students will be required, each week, to discover and bring in a particular real-life example of a web document that uses the markup technology of the week, and much time will be taken in class reading, line by line, the particular marked-up documents that students have brought in.

Unfortunately, these markup languages were never meant for human beings to read, and even their inventors regard them as obnoxiously obscure. However, the user-friendly software tools that are supposed to stand between human beings and marked-up documents largely do not exist, and will not for the time being substitute for an ability to read the markup. Nor do there exist useful textbooks or manuals of semantic web technologies that are written for anyone except computer scientists. Accordingly, large amounts of computer science will be explained in plain English as we read the markup.

The most familiar markup language is HTML, a simple language that provides features for common document formatting conventions and for hyperlinks to other documents on the web. Although it is not a prerequisite for the course, students who take a few days to teach themselves HTML before the course begins will be happier than ones who do not. Several introductions to HTML are freely available on the Web, for example:

Burke's How to Write HTML
http://www.speech.cs.cmu.edu/~sburke/html/

Introduction to HTML
http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/htmlindex.html

HTML Code Tutorial
http://www.htmlcodetutorial.com/

Students will also be happier if they have used the "Source" menu entry in Internet Explorer (or "Page Source" in Netscape) to retrieve and read the marked-up document source for several simple HTML pages. HTML, however, is being replaced by a very similar XML-based markup language called XHTML. Students in this course will learn to write simple XHTML web pages, and will use XHTML format to write a heavily-hyperlinked online paper that describes how the semantic web is being used, and can be used, in a particular industry (e.g., finance or media), profession (e.g., architecture or engineering), or academic field (e.g., classics or geography). The assignment for the paper is here:

http://polaris.gseis.ucla.edu/pagre/is277-assignment.html

The assignment for bringing in some marked-up pages each week is here:

http://polaris.gseis.ucla.edu/pagre/is277-markup.html

The online paper will be 75% of the grade and the weekly collection of real-life examples of marked-up web documents will be 25%.

In general, this will be a paperless course. Students will "hand in" all of the weekly assignments, the paper proposal, and the paper itself by linking to them from a Web page that they maintain themselves. Here is an example of what such a Web page might be like:

http://polaris.gseis.ucla.edu/pagre/is277-example.html

Once you create your page, please send Phil the URL for it. Here is the directory of IS 277 students' Web pages:

http://polaris.gseis.ucla.edu/pagre/is277-pages.html

All of the course readings will be on the web. Students who wish to purchase a (relatively) introductory book on the semantic web might use an online bookseller to buy a copy of Grigoris Antoniou and Frank van Harmelen, A Semantic Web Primer, MIT Press, 2004. This book is also online:

http://thoth.ilit.umbc.edu/CMSC-771/semantic%20web%20primer.pdf

Each class session will include a document reading session and a lecture. Each lecture will introduce the week's material using theory and simple examples, and the corresponding readings should be done after the lecture. The readings will typically be too technical, but students should read them as best they can. Students should use the readings in an attempt to read the real-life marked-up documents that they collect from the web, and should come to class prepared to explain their documents line by line, again as best they can. We will have an Internet connection and projector in class to read online materials.

Week 1. Ontology standards

slides for this week's lecture (in PowerPoint)
http://polaris.gseis.ucla.edu/pagre/ontologies.ppt

Spinning the Semantic Web (introduction)
http://w5.cs.uni-sb.de/ss03/SemanticWebHTML/Vorlesung%20SemanticWebSS03/Introduction.pdf

Business to Consumer Markets on the Semantic Web
http://www.wiwiss.fu-berlin.de/suhl/bizer/pub/otm2003_Semmarkets.pdf

Working towards MetaUtopia: A Survey of Current Metadata Research
http://archive.dstc.edu.au/RDU/staff/jane-hunter/LibTrends_paper.pdf

Using Ontologies: Enabling Knowledge Sharing and Reuse on the Semantic Web
http://www.deri.org/fileadmin/documents/DERI-TR-2003-10-29.pdf

Semantic Web Portals
(read through page 17)
http://sw-portal.deri.org/papers/publications/SemanticWebPortalSurvey.pdf

Sorting Things Out: Classification and Its Consequences
(read the introduction)
http://epl.scu.edu:16080/~gbowker/classification/

The Cascade of Interactions in the Digital Library Interface
http://www.gseis.ucla.edu/faculty/bates/articles/cascade.html

Recommended reading:

Anatomy of the Grid
http://hpc.sagepub.com/cgi/content/short/15/3/200

Physiology of the Grid
http://www.globus.org/alliance/publications/papers/ogsa.pdf

Information Technology and the Transformation of Research
http://www7.nationalacademies.org/itru/Transforming%20Research.pdf

A Global Grid-Enabled Collaboratory for Scientific Research
http://pcbunn.cithep.caltech.edu/GECSR_Final.pdf

Social Theoretical Issues in the Design of Collaboratories
http://epl.scu.edu:16080/~gbowker/collab.pdf

Towards Institutional Infrastructures for E-Science
http://www.oii.ox.ac.uk/resources/publications/OIIRR2_200309.pdf

A Practical Guide to Federal Enterprise Architecture
http://www.cio.gov/archive/bpeaguide.pdf

Week 2. XML

A Gentle Introduction to XML
(read through section 2.3)
http://www.tei-c.org/P4X/SG.html

a TEI manual including examples of XML markup
http://etext.lib.virginia.edu/tei/uvatei.html

introduction to XML syntax
http://www.zvon.org/xxl/XMLTutorial/General/book.html

some examples of XML pages
(use "Page Source")
http://www.w3schools.com/xml/plant_catalog.xml
http://www.ibiblio.org/xml/examples/shakespeare/win_tale.xml
http://www.ibiblio.org/xml/examples/4-2.xml
http://www.scc.rutgers.edu/ceth/intromat/xml/samples3/poem/Converted_WordsworthPoem_xml.htm
http://www.brics.dk/~amoeller/XML/xml/recipes2.xml
http://www.ise.gmu.edu/faculty/ofut/classes/642/Examples/XML/stamps.xml
http://clerk.house.gov/evs/2006/roll004.xml

an example of an XML application
http://www.recordare.com/xml.html

Understanding ebXML
http://www-106.ibm.com/developerworks/xml/library/x-ebxml/

ebXML: A Critical Analysis
http://www.rawlinsecconsulting.com/ebXML/

Swoogle
http://swoogle.umbc.edu/

the Protege ontology editor
http://protege.stanford.edu/

Piggy Bank semantic web extension for Firefox
http://simile.mit.edu/piggy-bank/

Also, read several pages on these sites to get the general idea:

World Wide Web Consortium
http://www.w3.org/

Organization for the Advancement of Structured Information Standards
http://www.oasis-open.org/

OpenDocument
http://www.google.com/search?hl=en&q=%22Open+Document+Format%22&btnG=Google+Search

An act to add Section 11541.1 to the Government Code, relating to information technology
http://www.leginfo.ca.gov/pub/07-08/bill/asm/ab_1651-1700/ab_1668_bill_20070223_introduced.html

Asynchronous JavaScripting and XML
http://www.ajaxmatters.com/

The Cover Pages
http://xml.coverpages.org/

O'Reilly xml.com
http://www.xml.com/

xml.gov
http://www.xml.gov/index.asp

European survey of semantic Web applications
http://www.w3.org/2001/sw/Europe/reports/chosen_demos_rationale_report/hp-applications-survey.html

"Thinking XML" column at the IBM developers' site
http://www.ibm.com/developerworks/views/xml/libraryview.jsp?search_by=thinking+xml:

IBM XML Technical Library
http://www-128.ibm.com/developerworks/views/xml/library.jsp

Week 3. XML document types

A Gentle Introduction to XML
(read section 2.4 onward)
http://www.tei-c.org/P4X/SG.html

some examples of XML pages with document type definitions
http://www.w3schools.com/xml/node_in_dtd.xml
http://www.npac.syr.edu/projects/tutorials/XML/example_files/booksIntSub.xml
http://www.cs.rpi.edu/~puninj/XMLJ/classes/class3/slide11-0.html

some more DTD's
http://www.brics.dk/~amoeller/XML/schemas/dtd-example.html
http://www.fly.faa.gov/AirportStatus.dtd
http://support.sciencedirect.com/xml/sd_holdings_01.dtd

an example of a large-scale XML document type
http://www.oreilly.com/catalog/docbook/chapter/book/docbook.html
http://www.docbook.org/

Comparative Analysis of Standardization of Vertical Industry Languages
(scroll down to page 210)
http://www.si.umich.edu/misq-stds/proceedings/ICIS2003-misq-stds.pdf

Standards Fragmentation in Electronic Markets
http://wareham.eci.gsu.edu/Resume/Papers/WarehamRaiXML.pdf

if you want to learn XML Schema, an alternative to DTD's:

XML Schema examples and tutorial
http://www.xfront.com/

Open Archives Initiative
http://www.openarchives.org/

Recommended reading:

Standardization of XML-Based e-Business Frameworks
http://www.si.umich.edu/misq-stds/proceedings/137_135-146.pdf

A Web-Based Negotiation System
http://www.ists.dartmouth.edu/library/odi1103.pdf

A System for the Mediated Sharing of Sensitive Data
http://www.ists.dartmouth.edu/library/sce0503.pdf

Porting a Rich-Media Collection to a Mobile Platform
http://www.mlearn.org.za/CD/papers/arias-%20reichenbach-pasch.pdf

On Engineering Design Generation with XML-Based Knowledge-Enhanced Grammars
http://www.ifi.unizh.ch/~noser/BIBLIO/rudolph00.pdf

XML-Based Modeling of Corporate Memory
http://www.icaen.uiowa.edu/~ankusiak/Journal-papers/Bill_IEEE.pdf

Third Workshop on Legislative XML
http://www.cnipa.gov.it/site/_files/Quaderno%2018.pdf

Week 4. XHTML

XHTML 1.0: The Extensible HyperText Markup Language
http://www.w3.org/TR/xhtml1/

XHTML W3C Recommendation Summary
http://train.msu.edu/classinfo/downloads/xhtml.pdf

Index of HTML Elements
http://www.w3.org/TR/html401/index/elements.html

an example of an XHTML page
(use "Page Source")
http://www.w3.org/

Week 5. RDF

An Introduction to the Resource Description Framework
http://www.dlib.org/dlib/may98/miller/05miller.html

RDF Primer
(skip section 5 on RDF Schema)
http://www.w3.org/TR/rdf-primer/

Collaborative Mapping with RDF
http://www.idealliance.org/papers/dx_xmle03/papers/03-03-03/03-03-03.pdf

conference proceedings with extensive RDF markup
http://dc2003.ischool.washington.edu/

an example of an RDF database application
(click on "data" for the RDF files)
http://chefmoz.org/

An RDF Model for Multi-Level Hypertext in Digital Libraries
http://www.is.informatik.uni-duisburg.de/bib/pdf/ir/Fischer_Fuhr:02.pdf

Barriers to Real World Adoption of Semantic Web Technologies
http://www.cems.uwe.ac.uk/~mhbutler/papers/barriersToRealWorldAdoptRDF.pdf

some examples of RDF files
http://www.zvon.org/xxl/RDFTutorial/Examples/example1.html
http://www.ontoknowledge.org/oil/case-studies/KA-facts.rdf
http://www.ukoln.ac.uk/metadata/resources/rdf/examples/2/

Recommended reading:

Introducing SPARQL (RDF database language)
http://www.xml.com/lpt/a/2005/11/16/introducing-sparql-querying-semantic-web-tutorial.html

Week 6. RDF Schema

RDF Primer
(read section 5 on RDF Schema)
http://www.w3.org/TR/rdf-primer/

RDF Vocabulary Description Language 1.0: RDF Schema
http://www.w3.org/TR/rdf-schema/

RDF Schema Directory
http://139.91.183.30:9090/RDF/Examples.html

an RDF Schema markup language and some instances of it
http://139.91.183.30:9090/RDF/VRP/Examples/gml.rdfs
http://139.91.183.30:9090/RDF/VRP/Examples/example_profile3.rdf

more examples of RDF Schema pages
http://www.csd.abdn.ac.uk/~yzhang/test.rdfs
http://swws.semanticweb.org/data/swws_web_site_kb.rdfs
http://www.csc.fi/kielipankki/puhe/schemas/official/recording.rdfs
http://www.ilrt.bris.ac.uk/discovery/2001/09/rdf-schema-tests/rdf-schema.rdfs
http://lsdis.cs.uga.edu/~farshad/events/EventSchema.rdf
http://www.metadata.net/harmony/ABCSchemaV5Commented.rdf

Recommended reading:

papers from Semantics 2006
http://www.semantics2006.net/

Week 7. RSS 1.0

An Introduction to RSS for Educational Designers
http://www.downes.ca/files/RSS_Educ.htm

RDF Site Summary (RSS) 1.0
http://web.resource.org/rss/1.0/spec

RDF Schema for RSS 1.0
(use "Page Source" and scroll down)
http://web.resource.org/rss/1.0/

Semantic Blogging
http://dijest.com/aka/2003/08/23.html#a2584

an example of an RSS interface
http://bloglines.com/

some examples of RSS 1.0 blog markup
(use "Page Source")
http://www.pocketsoap.com/weblog/rss.xml
http://www.techbargains.com/rss.xml
http://boingboing.net/rss.xml
http://www.ilrt.bris.ac.uk/discovery/rdf/resources/rss.rdf

Recommended reading:

Why Choose RSS 1.0?
http://www.xml.com/lpt/a/2003/07/23/rssone.html

"a universal publishing standard for personal content and weblogs"
http://www.atomenabled.org/

Week 8. Learning Object Metadata

Instructional Planning with Learning Objects
(scroll down to page 52)
http://www.uni-koblenz.de/fb4/publikationen/gelbereihe/RR-16-2003.pdf

Semantic Web Meta-data for e-Learning
http://kmr.nada.kth.se/papers/SemanticWeb/p744-nilsson.pdf

Interoperability between Library Information Services and Learning Environments
http://www.imsproject.org/digitalrepositories/CNIandIMS_2004.pdf

How RDF Will Change Learning Technology Standards
http://www.cetis.ac.uk/content/20010927172953/viewArticle

The Next Wave: CETIS Interviews Mikael Nilsson about the Edutella Project
http://www.cetis.ac.uk/content/20010927163232

EDUTELLA: A P2P Networking Infrastructure Based on RDF
http://edutella.jxta.org/reports/edutella-whitepaper.pdf

Learning Object Metadata
http://ltsc.ieee.org/wg12/

XML Knowledge Management Flourishes in Learning Technology Initiatives
http://www-106.ibm.com/developerworks/xml/library/x-think21.html

DTD for learning object metadata
http://www.ema.fr/~mcrampes/Cours_%20semantic_web/TPXML02/LOM%20DTD%20imsmd_rootv1p2.dtd

an example of learning object metadata
(scroll down below the tables)
http://www.rdn.ac.uk/publications/rdn-ltsn/ap/

more examples
(use "Page Source")
http://www.imsglobal.org/metadata/mdv1p2p2/samples/merlot/MERLOTexample1_schema.xml
http://www.imsglobal.org/metadata/mdv1p2p2/samples/ims/imsmdexample_schema.xml
http://www.imsglobal.org/metadata/mdv1p3pd/xslt/samples-LOM/test_schema_LOM.xml

another example
(scroll down)
http://math.unipa.it/~grim/SiDonley.PDF

IEEE Learning Object Metadata RDF Binding
http://kmr.nada.kth.se/el/ims/md-lomrdf.html

some of the RDF files
http://kmr.nada.kth.se/el/ims/schemas/lom-general
http://kmr.nada.kth.se/el/ims/schemas/lom-educational
http://kmr.nada.kth.se/el/ims/schemas/lom-lifecycle
http://kmr.nada.kth.se/el/ims/schemas/lom-rights
http://kmr.nada.kth.se/el/ims/schemas/lom-metametadata
http://kmr.nada.kth.se/el/ims/schemas/lom-classification

an example
http://kmr.nada.kth.se/el/ims/examples/lom-rdf1.rdf

IMS Resource Description Framework RDF Bindings
http://www.imsproject.org/rdf/

Recommended reading:

Business Process Managment Technology in e-Learning Systems
http://coronet.iicm.edu/denis/pubs/elearn2005a.pdf

For the really dedicated:

IMS Global Learning Consortium
http://www.imsglobal.org/

SCORM (Sharable Content Object Reference Model)
http://www.adlnet.gov/scorm/index.cfm

Week 9. OWL

Web Ontology Language Guide
http://www.w3.org/TR/owl-guide/

Web Ontology Language: OWL
http://www.cs.vu.nl/~frankh/postscript/OntoHandbook03OWL.pdf

OWL Use Cases and Requirements
http://www.w3.org/TR/2004/REC-webont-req-20040210/

Semantic Web in a Pervasive Context-Aware Architecture
http://w5.cs.uni-sb.de/~krueger/aims2003/camera-ready/chen-8.pdf

some examples of OWL
http://www.cs.vu.nl/~frankh/spool/wildlife.owl
http://www.w3.org/TR/2002/WD-owl-guide-20021104/food.owl
http://www.aiai.ed.ac.uk/resources/go/obo.owl
http://osm.cs.byu.edu/CS652s04/ontologies/OWL/carads.owl

Recommended reading:

eClassOWL: A Fully-Fledged Products and Services Ontology in OWL
http://www.heppnetz.de/files/eclassOWL-finalPoster-shortA4.pdf

Standard Ontology for Ubiquitous and Pervasive Applications
http://ebiquity.umbc.edu/_file_directory_/papers/105.pdf

Semantic Web Technologies for Context-Aware Museum Tour Guide Applications
http://www.cs.cmu.edu/~sadeh/Publications/MCommerce/WAMIS05%20Submission_Final.pdf

Semantic Web for Research Communities
http://www.aifb.uni-karlsruhe.de/WBS/ysu/publications/2005_swrc_baosw.pdf

syllabus for a course about OWL (with numerous readings)
http://www.cse.lehigh.edu/~heflin/courses/sw-2006/

Week 10. OWL-S

Semantic Web Services: A Communication Infrastructure for eWork and eCommerce
http://www.springerlink.com/index/CJHJQLML7JKQ8RVE.pdf

The Semantic Grid: A Future e-Science Infrastructure
http://www.semanticgrid.org/documents/semgrid-journal/semgrid-journal.pdf

OWL-S: Semantic Markup for Web Services
http://www.daml.org/services/owl-s/1.0/owl-s.html

some examples of OWL-S services
http://www.mindswap.org/2004/owl-s/services.shtml

Ontology-Enabled Pervasive Computing Applications
http://www.flacp.fujitsulabs.com/~rmasuoka/papers/20030915-Task-Computing-IEEE-Intelligent-Systems-September-October-2003.pdf

Recommended reading:

"web services" versions of established distributed computing ideas
http://www-128.ibm.com/developerworks/webservices/library/ws-comproto/

Service-Oriented Computing - ICSOC 2005
http://www.springer.com/west/home?SGWID=4-102-22-107952204-0

Customized Delivery of E-Government Web Services
http://www-personal.engin.umd.umich.edu/~brahim/mypublications/medjahed-IS.pdf

Planning for Semantic Web Services
http://www.ai.sri.com/SWS2004/final-versions/SWS2004-Sirin-Final.pdf

Interleaving Semantic Web Reasoning and Service Discovery
http://www.cs.cmu.edu/~sadeh/Publications/More%20Complete%20List/techreport%20%20july%2027%202005.pdf

A Framework for Dynamic Semantic Web Services Management
http://eceb.gmu.edu/pubs/IJCIS_Howard_Kerschberg.pdf

A System for Dynamically Composing and Intelligently Executing Web Services
http://dblab.usc.edu/Users/shkim/papers/proteus.pdf

Pitfalls of OWL-S
http://www.informatik.uni-ulm.de/ki/Liebig/papers/icsoc04.html

Conflicts in the Internet Standards Process
http://www.stevens-tech.edu/jnickerson/SpiritOfTheWeb.pdf