IS 240 -- Information Systems Analysis and Design

Assignment for week 10: Service architecture.

The main order of business this week is iterating your design, getting all of the pieces to make sense and work together. But we'll also take up a final substantive issue: the rudiments of information service architecture. So far, the information service you've been designing is just a couple of diagrams and some sketches for an interface, with little sense of how any of the stuff is implemented. Writing the actual code is someone else's job, at least according to the traditional division of labor. But as a designer, you must decide some deep structural questions: how the service is organized into modules, what platforms they're all built on, and what protocols they use to communicate with. We have addressed such questions before, back in week 4 when we looked at some of the platforms that are available as standards in the real world -- TCP/IP and WAP, for example -- and in week 5 when we explored what it means to build a service on top of a service layer -- the hypothetical Suitcase Layer. This week we will consider such matters more systematically.

It will help to read the sections in Messerschmitt on the client-server model, the peer-to-peer model, layered services, middleware, network protocols, and transaction processing. I will not repeat this material, but I will explain bits and pieces as part of the assignment.

Here is an intuitive way to understand the issue. Your service will probably include several distinct devices -- such as desktop and handheld computers -- that the various parties use to interact with one another. The question is, how is the computing activity divided among the various devices? One option is to have a central computer -- a server -- that does all the hard work. The server, which is operated by some central authority somewhere out on the Internet, stores all the data and performs all of the hard computing effort. Then each individual has a device -- a client -- that does little work besides displaying images on the screen and relaying commands back to the server. The client-server model can be contrasted with the peer-to-peer model, which has no server and indeed no asymmetrical relationship among the devices. Each device stores its own data, and the devices communicate with one another over the network directly when they need to interact. Both the client-server model and the peer-to-peer model provide coherent ways to divide the computational effort among several machines. And many other options are possible. Napster, for example, has a small server that tells the clients about one another, but most of the action happens in a peer-to-peer mode of data flows among the individuals' machines.

The great success of the Web has accustomed us to thinking in client-server terms. And indeed, the rational designer today hesitates before building an information service on any platform except the Web. But that will surely change as service models multiply, and so you should look at the service architecture problem in the more general way that people looked at it before the Web: what is the best way to distribute the data and computational effort among the various machines that are involved in your service?

You may wonder how this question might be answered. Surely the world is full of advanced methodologies for deriving optimal answers to it. Not so. You can read a book like Whitten and Bentley, and they will provide an elaborate design methodology that doesn't constrain you that much. Some answers are better than others, but if you comprehend the question then you are qualified to evaluate the answers.

More formally, you are starting with two diagrams: a data model and a dataflow diagram. Every entity in your data model corresponds to a relational database on some computer. Every process in your dataflow diagram corresponds to a piece of software code running on some computer. If a process requires information from a certain database, then it will be convenient for the process and the database to be located on the same computer. But it won't be obligatory: if a process on computer A needs data from a database on computer B, it can always retrieve it over the network. But this can be slow. What is the designer to do?

Much of this complexity can be suppressed with middleware. For example, a distributed database service layer might be designed to hide the boundaries between the different computers, so that every database looks like it is immediately ready-to-hand whenever any process on any computer wants to read from it or write to it. (The assignment for week 5 mentioned distributed object management; this is a distributed database for object-oriented data, as opposed to the relational model of data that we are using.) The distributed database middleware layer might replicate the various databases on several machines, using subtle protocols to keep the replicated copies consistent, or it might simply send a request over the network whenever a process on one machine requires data on another machine.

We are *not* going to assume the existence of a distributed database in this assignment. A distributed database is excessive generality for our purposes, and it fails to illuminate the architecture issues. Instead, we are going to face these issues squarely. So the question is: which databases do you want to store on which computers, and which processes do you want to implement on which computers? Even though you don't have a general-purpose distributed database, you will certainly have a middleware layer that allows any process to interact with a database on a different machine. (You don't have to worry about the mechanics of this interaction. It involves simple packets sent over the Internet.) So you have a wide range of choices, and the only real consideration is efficiency: it's a lot slower for a process to interact with a database on a machine that's 1000 miles away than for a process to interact with a database on the same machine. So the rules of thumb for design are obvious: try to group processes together with the databases they will use, and try to put processes and data close to the people who most want to use them. You can easily divide a database across several machines, for example with the records pertaining to each individual customer being located on the particular machine that the customer users. (Dividing databases in this way puts an extra burden on the middleware layer that interacts with remote databases, but don't worry about that.) You can also place duplicate copies of the same process on several different machines if that's useful. But don't try to replicate the same data on different machines; that's too hard.

If this all seems abstract, think of it this way. Ignore your data model for the moment. Look at your dataflow diagram, and draw lines through it that partition the processes and databases into different regions. Each region is a different machine. If you draw a big central region that contains most of the databases and processes then you're probably imagining a client-server architecture. If you draw a moderate-sized region around each individual person and his or her own local data and processes then you're probably imagining a peer-to-peer architecture. Your regions can overlap, indicating that you are dividing up some of the databases between different machines and duplicating some of the processes accordingly. This makes more sense when you see it drawn on the whiteboard, and I'll do just that in class.

Dividing the labor among different computers is still an abstract approach to service architecture. Along the way, you will have to make some more concrete decisions. First of all, what platforms is your service built on? This will include hardware platforms such as IBM-PC compatible personal computers, the Palm Pilot, WAP-enabled cell phones, and big mainframes operating as servers. You are welcome to posit new hardware platforms, much the way we did when designing for the Suitcase Layer in week 5. Remember that we are assuming the technological capabilities of ten years from now. Other platforms will be implemented in software, such as the World Wide Web, the Internet Protocol, and the Secure Sockets Layer. If you want to posit new software platforms then that's probably fine, but check with us first. Last week you invested much effort designing the visual appearance and physical form of people's interactions with your service, and now it is time to specify the guts of the devices that the people interact with.