gzip file  WebDB'98: On-Line Papers  
Back to the WebDB'98 Home Page
 

Postscript versions of extended abstracts of papers accepted for the presentation at the workshop can be downloaded from this page. It is anticipated that extended versions of these papers will appear in a volume of the Springer Verlag LNCS Series

  Title and Abstract 
gzip file
Interactive Query and Search in Semistructured Databases
R. Goldman, J. Widom

Semistructured graph-based databases have been proposed as well-suited stores for World-Wide Web data. Yet so far, languages for querying such data are too complex for casual Web users. Further, proposed query approaches do not take advantage of the interactive nature of typical Web sessions--users are proficient at iteratively refining their Web explorations. In this paper we propose a new model for interactively querying and searching semistructured databases. Users can begin with a simple keyword search, dynamically browse the structure of the result, and then submit further refining queries. Enabling this model exposes new requirements of a semistructured database management system that are not apparent under traditional database uses. We demonstrate the importance of efficient keyword search, structural summaries of query results, and support for inverse pointers. We also describe some preliminary solutions to these technical issues.   

 gzip file
Schema-Based Data Translation
T. Milo, S. Zohar

A broad spectrum of data is available on the Web in distinct heterogeneous sources, and stored under different formats. As the number of systems that wish to utilize this heterogeneous data grows, the importance of data translation mechanisms increases greatly. We examine this problem from the schema point of view. We observe that in many cases the schema of the data in the source system is very similar to the that of the target system. In such cases, most of the translation work can be done automatically, based on the schemas similarity. This can save a lot of effort for the user, limiting the amount of programming needed. We define a common schema model and a common data model, in which schemas and data (respectively) from many common models can be represented. Using a rule-based method, the source schema is compared with the target one, and each component in the source schema is matched with a corresponding component in the target schema. Then, based on the matching achieved, data instances of the source schema can be translated to instances of the target schema. We built a prototype system, accessible trough the Web, which implements the above ideas.  

 gzip file
WUM - A tool for WWW Utilization Analysis
M. Spiliopoulou, L. C. Faulstich

We describe the Web Utilization Miner (WUM), designed to evaluate how well the information offered in a web domain meets the user's demands. To this purpose, WWW access logs are aggregated into a specialized data warehouse, over which a new data mining technique is applied. In this paper, we focus on the MINT query language used by the data mining expert to specify the structural and statistical properties of access patterns of potential interest. MINT offers thus a generic way of guiding the mining process over the data.WUM is being implemented in the framework of the WIND (Warehouse for INternet Data) architecture.    

 gzip file
Bringing Database Functionality to the WWW
D. Konopnicki, O. Shmueli

Database Management Systems excel at managing large quantities of data, primarily enterprise data. The WWW is a huge heterogeneous distributed database. To support advanced, robust and reliable applications, such as efficient and powerful querying, groupware and electronic commerce, database functionalities need be added to the WWW. A major difficulty is that database techniques were targeted at a single enterprise environment, providing a centralized control over data and meta-data, statistics for query processing and the ability to utilize monolithic mechanisms for concurrency control and recovery. We introduce new ideas and mechanisms for ``importing'' database techniques and functionalities to the WWW. Specifically, we propose a hierarchical, object oriented, abstract data model for the WWW that would enable the definition of a powerful and optimizable WWW standard query language. Query processing techniques designed for the WWW are a crucial element in harnessing the WWW. Other traditional facilities, such as a notion of ``data stability'' and atomicity present promising research directions.    

gzip file 
Finding near-replicas of documents and servers on the Web
N. Shivakumar, H. Garcia-Molina

We consider how to efficiently compute the overlap between all pairs of web documents. This information can be used to improve web crawlers, web archivers and in the presentation of search results, among others. We report statistics on how common replication is on the web, and on the cost of computing the above information for a relatively large subset of the web -- about 24 million web pages which corresponds to about 150 Gigabytes of textual information.    

 gzip file
Incremental Maintenance of Hypertext Views
G. Sindoni

A materialized hypertext view is a hypertext containing data coming from a database and whose pages are stored in HTML files. This paper deals with the maintenance issues required by these derived hypertext to enforce consistency between page content and database state. Here, hypertext views are defined as nested oid-based views over the set of base relations. An algebra is proposed, which is based on a specific data model and which allows to define views and view updates. A language has been defined, whose instructions allow to update the hypertext to reflect the current database state. Incremental maintenance is performed by a simple algorithm that takes as input a set of changes on the database and automatically produces the update instructions. The motivation of this study is the development of the Araneus Web-Base Management System, a system that provides both database and Web site management.    

 gzip file
The Fenarete System: a Web based clinical protocol management system
S. Puglia, G. Rumolo

We present the architecture of a novel clinical protocol management system: Fenarete. The Fenarete system is an integrated software tool that allows to design and distribute clinical protocols in an IntraNet framework. We consider a medical protocol as a clinical behaviour scheme (a clinical Work Flow), formally and clearly defined. Our work allows the knowledge content of any clinical protocol to be fully represented both in a graphic, presentation oriented, style and in a logical, DB oriented, style. The Fenarete system works as an interface between clinicians and the health care information system. The Fenarete application is under developement since 1996 and we are using Java and Araneus technologies.  

 gzip file
Transactional services for the Internet
D. Billard

This paper investigates a new paradigm in transactional services, specially tailored for Internet purposes. This new paradigm considers transactions (called I-Transactions, I standing for "Internet") as user's atomic actions that run upon multiple databases which do not know the existence of each other. Classic transactions are designed to cope with multiple users accessing a particular DBMS, or a federation of well-known DBMS. Inversely, I-Transactions are not bounded to a particular DBMS, or a federation of DBMS, but are related to a single user (a user that either can be a physical person or a computer application). Therefore, I-Transactions are not managed by a transaction manager since they are unique in the sense that they provide an atomic action upon a set of DBMS that do not know each other and which may not be simultaneously accessed again by another I-Transaction. Roughly speaking, I-Transactions are self-managed. I-Transactions are tailored to be used in the Internet environment, which now does not support many kind of transactional facility. Furthermore, I-Transactions are designed to be easily integrated in existing Internet applications.This paper proposes a description of the I-Transactions and the constraints related to their utilization. It also outlines the differences among the CORBA Transaction Service specification and I-Transactions and how the implementation of such I-Transactions can benefit from the use of mobile agents.   

 gzip file
Extracting Patterns and Relations from the World Wide Web
S. Brin

The World Wide Web is a vast resource for information. At the same time it is extremely distributed. A particular type of data such as restaurant lists may be scattered across thousands of independent information sources in many different formats. In this paper, we consider the problem of extracting a relation for such a data type from all of these sources automatically. We present a technique which exploits the duality between sets of patterns and relations to grow the target relation starting from a small sample. To test our technique we use it to extract a relation of (author,title) pairs from the World Wide Web.    

 gzip file
Fixpoint Path Queries
N. Bidoit, M. Ykhlef

The paper proposes two fixpoint path query languages Path-Fixpoint and Path-While for unstructured data whose expressive power is that of Fixpoint and While queries respectively. These languages are multi-sorted like logic languages integrating fixpoint path expressions.    

 gzip file
A language for publishing virtual documents on the Web
F. Paradis, A. M. Vercoustre

The following issues are increasingly becoming important for Web publishers: publishing and integrating non-HTML information, extracting fragments of HTML pages, updating and maintaining existing pages, etc. In this paper we present a descriptive language for Web publishing that allows users to vrite virtual documents, where dynamic information can be retrieved from various sources, transformed and included along with static information in HTML documents. The language uses a tree-like structure for the representation of information, and defines a database-like query language for extracting and combining information without a complete knowledge of the structure or the types of information. The data structures and the syntax of the language are presented along with examples.   

gzip file 
On the unification of persistent programming and the World Wide Web
R. Connor, K. Sibson, P. Manghi

In its infancy, the World Wide Web consisted of a web of simple hypertext documents trasmitted on request by simple servers. As time progresses it is evolving into a domain which supports almost arbitrary networked computations. Central to its successful operation however is the agreement of simple standards such as HTML and http, which provide inter-node communication via the medium of text files. Our hypothesis is that, as application sophistication increases, this text-based interface will present the same problems to programmers as the use of traditional text-based file and database system interfaces within programming languages. Persistent programming systems were designed to overcome these problems in the traditional domains; our investigation is to reapply the research performed to the new domain of the Web. The result of this is the ability to pass typed data layered on top of the existing standards, in a manner that is fully integrated with them. The significance with respect to Web databases is that a typed object protocol layered over http allows the Web to be used to host a global persistens address space, thus making the whole Web a potential data repository for a generation of database programming languages.   

 gzip file
WebSuite -- A tool suite for harnessing Web data
C. Beeri, G. Elber, T. Milo, Y. Sagiv, O. Shmueli, N. Tishby, Y. Kogan, D. Konopnicki, P. Mogilevski, N. Slonim

We present a system for searching, collecting, integrating and managing Web-resident data. The system consists of tools, each providing a specific functionality aimed at solving one aspect of the complex task of using and managing Web data. Each tool can be used in a stand-alone mode, in combination with the other tools, or even in conjunction with other systems. Together, the tools offer a range of capabilities that overcome many of the limitations in existing systems for harnessing Web data. The paper describes each tool, and possible ways of combining the tools.   

 gzip file
Language and tools to specify hypertext views on databases
G. Falquet, J. Guyot, L. Nerima

We present a declarative language for the construction of hypertext views on databases. The language is based on an object-oriented data model and a simple hypertext model with reference and inclusion links. A hypertext view specification consists in a collection of parameterized node schemas which specify how to construct node and link instances from the database contents. We show how this language can express different issues in hypertext view design. These include: the direct mapping of objects to nodes; the construction of complex nodes based on sets of objects; the representation of polymorhic sets of objects; and the representation of tree and graph structures. We have defined sublanguages corresponding to particular database models (relational, semantic, object-oriented) and implemented tools to generate Web views for these database models.   

 gzip file
Using YAT to build a Web server
G. Simeon, S. Cluet

Integration of heterogeneous data sources in a Web environment has become a major concern of the database community. Architectures, data models and query languages have been proposed but the complementary problem of data conversion has been less studied. The YAT system provides a means to build software components based on data conversion, such as wrappers or mediators, in a simple and declarative way. We show that the YAT system can also be used to create integrated Web views over heterogeneous data sources very easily. Only minor changes were required for YAT to provide data integration (as opposed to data conversion) in a Web environment. Finaly, we report on our experience while building the Verso Web site using YAT.   

 gzip file
A Unified Algorithm for Cache Replacement and Consistency in Web Proxy Servers
J. Shim, P. Scheuermann, R. Vingralek

Caching of Web documents improves the response time perceived by the clients. Cache replacement algorithms play a central role in the response time reduction by selecting a subset of documents for caching so that an appropriate performance metric is maximized. At the same time, the cache must take extra steps to guarantee some form of consistency of the cached data. Cache consistency algorithms enforce appropriate guarantees about the staleness of documents it stores. Most of the published work on Web cache design either considers cache consistency algorithms separately from cache replacement algorithms or concentrates only on studying one of the two. We argue that cache performance can be improved by integrating cache replacement and consistency algorithms. We present an unified algorithm LNC-R-W3-U. Using trace-based experiments, we demonstrate that LNC-R-W3-U achieves performance comparable (and often superior) to most of the published cache replacement algorithms and at the same time significantly reduces the staleness of the cached documents.   

WebMaster