VLDB 2001: The Specificity Factor

The Specificity Factor (Spf)

VLDB'2001 Program Committee Chairs and General Chair
(Stefano Ceri, Peter Apers, Kotagiri Ramamohanarao, Richard Snodgrass, and Paolo Atzeni)

August, 2000

Many have noted that the papers appearing in recent database conferences such as VLDB and SIGMOD are getting more and more specific. Ten years ago, there were papers introducing new data models, query languages, and conceptual notations; one encounters few such papers today. This is in some way inevitable, as the field matures and as a literature and set of accepted concepts and paradigms becomes prevalent. Therein also lies the danger that the field is becoming ossified, that papers counter to the prevailing wisdom are rejected in favor of thorough studies of narrow topics of interest to only a few people.

The Asilomar report labels these latter papers "delta-X" papers, and recommends a radical approach of going to poster sessions and invited papers for conferences. We do not agree with such a disruptive strategy, preferring instead a more evolutionary approach that encourages broader papers.

The Spf is a rating that is a single digit, with a larger Spf indicating the paper is more specific, and thus appeals to a smaller portion of the community. While more specific papers should not be rejected out of hand, they need to be particularly compelling to be selected over less specific alternative papers.

The Spf is designed to be determined quickly, from a paper's title and abstract. Studies will be needed to determine whether the Spf is a well-defined metric; for this reason, we will not include it in the overall ranking computed for each paper, but will instead just have it available as one of the many considerations kept in mind when the paper is evaluated, both by the individual program committee members and during the final discussion at the program committee meeting.

Very generally, 1 (one) is added to the Spf for each significant reduction in the paper's domain of applicability. To provide an absolute scale, we list some rough guidelines. The guidelines are incomplete, and are intended only to be illustrative. Each successive level assumes everything in the previous levels fixed and known.

Spf Discussion

0. The paper introduces a new benefit to humanity (after all, that is ultimately why we are in this business).

1. The paper introduces a new means to effect a known benefit to humanity.

2. The paper introduces a new class of software to implement a known means to help humanity. This software manages data in some way, but the paper itself is not particular to a data model or query language.

3. The paper introduces a new or altered data model to support an existing class of software.

4. The paper introduces a new or improved query language or design notation or conceptual modeling technique for an existing data model.

5. The paper introduces a new or improved query optimization or evaluation technique to support an existing query language, or a new construct for an existing conceptual modeling technique.

6. The paper introduces a new or improved input to an existing query optimization or evaluation technique, or a new or improved way to determine the configuration of an existing conceptual modeling construct.

7. The paper introduces a new or improved way to calculate or estimate or tune an existing input to an existing query optimization technique.

One determines the Spf by first identifying where the paper falls vis-a-vis this range from 0 to 7, then adding one unit for each significant restriction on applicability (such as a paper applying to only one or a few operators of a query language) or not including a major part of the space (such as working only on select-project-join queries).
As examples, here are nonsensical (we hope!) titles at some of the Spf levels.

3	The "Hysterical" Data Model to Support Time-varying but Space-constant Data
4	The Hyperbolical Query Language to Support the Hysterical Data Model
5	New Space-constant Optimizations for the Hyperbolical Query Language
6	Circular Histograms for Use in Space-constant Optimizers
7	Fast Reconstruction of Circular Histograms
5+1=6	New Space-constant Optimizations for Aggregates in the Hyperbolical Query Language
5+1+1=7	New Space-constant Optimizations for Correlated Subqueries in the Hyperbolical Query Language on Shared-Nothing Multiprocessors
7+1+1=9	Tuning Circular Histograms for Use with Non-Materialized Views in a Low-Memory Environment

This last paper has the following significant restrictions in its domain of applicability:

Applies only to applications managing time-varying but space-constant data
Assumes the hysterical data model used for such applications; unclear whether it would apply to other data models
Assumes the hyperbolical query language for that data model
Assumes space-constant optimizers for this language; unclear whether it would apply to other optimizer approaches for this language
Assumes circular histograms for such optimizers
Considers only the tuning of such histograms
Considers reconstruction only for use with non-materialized views
Applies only in a low-memory environment; unclear whether it would apply when memory was prevalent

Given that this paper has a quite high Spf, it had better be pretty exciting to be preferred over, say a paper introducing new optimizations for the hyperbolical query language. The reason that Spf is only a single digit is that it is difficult to imagine a paper with a multi-digit Spf that anyone would want to read, though we're sure we'll be proven wrong some day.

Note also that the number of possible papers goes up exponentially with Spf. There are probably only a few dozen papers legitimately at Spf 2, and perhaps a few hundred papers at Spf 3. But at Spf 7, the number of possible papers, most of which are uninteresting, is mind-boggling.

Our informal experience with papers submitted to previous VLDB's (and SIGMOD's) is that most papers have an Spf between 4 and 7, with the particular ones at 7 quite narrow. Also, the title often doesn't reveal major restrictions to the applicability, but the abstract generally does (and should). Sometimes, the Spf increased by a unit when reading the paper, as a major restriction became apparent that wasn't mentioned in the title or abstract, indicating that the abstract and perhaps the title should be changed to make the restriction more explicit.

Our experience also is that many papers with a high Spf are excellent papers according to the accepted criteria. Their proposed approach is often fully described, the empirical studies quite thorough, with a large range of parameters that are varied. This makes sense, as it is easier to be thorough in a narrow domain than in one that is broader and more varied. This is one of the reasons that prototypical papers in VLDB and other high quality conferences have over time evolved into detailed studies of highly specific and well-defined questions, of interest to only a few people.

VLDB 2001 - Home Page

The WebMaster
Last update: September 5, 2000