WEIR Dataset

This dataset is a collection of web-pages used for research on the automatic extraction of data from the Web.
The dataset includes detail pages dowloaded from 40 web sites. Detail pages refer to four domains: Pages for the video games and the soccer players domains were gathered by means of a crawler based on a set expansion technique (see paper). Stock quotes and books pages were collected by querying the forms of 10 finance sites, and the forms of 10 bookstore sites.