SOFTWARE

Graph Compression

This software allows to compress graphs. In particular it is suitable for the Web Graph since it is able to exploit all the redundacies tipical of this graph. The compression format is based on the scheme we have proposed in Graph Compression by BFS (2009) by Alberto Apostolico and me. This implementation (in Java) presents some minor differences from the one (in C) used for tests shown in the paper cited above, however it almost achieves the same compression results and performance.

Changes (07/2012):

new command 'x': mixed parsing and compression phase (no huge temporary file)
new option '-s' suggested by Susana Ladra González: less memory used in the compression phase, no exploitation of the BFS it uses orginial indexes
new option '-sim': does not write the compressed file, only gives bits/link
bug fix

Graph Compression - Version 0.3.2

Jar (with BV extension) - 54 KB

Dependencies: WebGraph

Jar - 51 KB

Dependencies: none

API (in progress)

Graph Compression - Version 0.2.1

Jar (with BV extension) - 47 KB

Dependencies: WebGraph

Graph Compression - Version 0.1

Jar (with BV extension) - 43 KB

Dependencies: WebGraph

Jar - 40 KB
API

Please use the following BibTeX entry if you would like to cite this software.

@Article{ad-gcbfs-09,
    AUTHOR  = {Apostolico, Alberto and Drovandi, Guido},
    TITLE   = {Graph Compression by BFS},
    JOURNAL = {Algorithms},
    VOLUME  = {2},
    YEAR    = {2009},
    NUMBER  = {3},
    PAGES   = {1031--1044},
    URL     = {http://www.mdpi.com/1999-4893/2/3/1031},
    ISSN    = {1999-4893},
    DOI     = {10.3390/a2031031}
}

Here you can find some compression results on datasets from http://law.dsi.unimi.it/. The compression level is set to 1000 for all the tests. The BFS Root (randomly chosen) is the index with respect to the original graph downloaded from the previous URL.

Graph	Nodes	Links	Bits/Link	BFS Root
arabic-2005	22 744 080	639 999 458	1.462	20 703 058
cnr-2000	325 557	3 216 152	1.838	130 582
eu-2005	862 664	19 235 140	2.699	232 241
in-2004	1 382 908	16 917 053	1.446	1 288 532
indochina-2004	7 414 866	194 109 311	0.936	1 710 995
it-2004	41 291 594	1 150 725 436	1.536	30 132 872
sk-2005	50 636 154	1 949 412 601	1.969	10 274 367
uk-2002	18 520 486	298 113 762	1.693	11 353 878
uk-2005	39 459 925	936 364 282	1.420	4 392 160
uk-2006-05	77 741 046	2 965 197 340	1.410	39 633 072
uk-2006-06	80 644 902	2 481 281 617	1.671	45 295 894
uk-2006-07	96 395 298	3 030 665 444	1.506	70 680 309
uk-2006-08	100 751 978	3 250 153 746	1.617	14 827 948
uk-2006-09	106 288 541	3 871 625 613	1.472	31 005 541
uk-2006-10	93 463 772	3 130 910 405	1.528	59 983 443
uk-2006-11	106 783 458	3 479 400 938	1.562	102 080 805
uk-2006-12	103 098 631	3 768 836 665	1.369	39 137 185
uk-2007-01	108 563 230	3 929 837 236	1.268	104 537 817
uk-2007-02	110 123 614	3 944 932 566	1.435	23 821 898
uk-2007-03	107 565 084	3 642 701 825	1.551	79 095 635
uk-2007-04	106 867 191	3 790 305 474	1.459	35 266 768
uk-2007-05	105 896 555	3 738 733 648	1.440	103 033 803
uk-union-2006-06-2007-05	133 633 040	5 507 679 822	1.830	40 996 664
webbase-2001	118 142 155	1 019 903 190	2.870	51 779 370

Bioinformatic

DMB (http://dmb.iasi.cnr.it) is a powerful data mining tool. It allows to classify many biological stuff (species, viruses, etc.) giving a set of logical formulas that are able to distinguish between the different classes given as input. Please refer to the DMB web-site for more information and the software.

Topics

SOFTWARE

Graph Compression

Bioinformatic