SOFTWARE



Graph Compression

Find the source code on !

This software allows to compress graphs. In particular it is suitable for the Web Graph since it is able to exploit all the redundacies tipical of this graph. The compression format is based on the scheme we have proposed in Graph Compression by BFS (2009) by Alberto Apostolico and me. This implementation (in Java) presents some minor differences from the one (in C) used for tests shown in the paper cited above, however it almost achieves the same compression results and performance.

Changes (07/2012):
    • new command 'x': mixed parsing and compression phase (no huge temporary file)
    • new option '-s' suggested by Susana Ladra González: less memory used in the compression phase, no exploitation of the BFS it uses orginial indexes
    • new option '-sim': does not write the compressed file, only gives bits/link
    • bug fix

Please use the following BibTeX entry if you would like to cite this software.

@Article{ad-gcbfs-09,
    AUTHOR  = {Apostolico, Alberto and Drovandi, Guido},
    TITLE   = {Graph Compression by BFS},
    JOURNAL = {Algorithms},
    VOLUME  = {2},
    YEAR    = {2009},
    NUMBER  = {3},
    PAGES   = {1031--1044},
    URL     = {http://www.mdpi.com/1999-4893/2/3/1031},
    ISSN    = {1999-4893},
    DOI     = {10.3390/a2031031}
}

Here you can find some compression results on datasets from http://law.dsi.unimi.it/. The compression level is set to 1000 for all the tests. The BFS Root (randomly chosen) is the index with respect to the original graph downloaded from the previous URL.

Graph Nodes Links Bits/Link BFS Root
arabic-2005 22 744 080 639 999 458 1.462 20 703 058
cnr-2000 325 557 3 216 152 1.838 130 582
eu-2005 862 664 19 235 140 2.699 232 241
in-2004 1 382 908 16 917 053 1.446 1 288 532
indochina-2004 7 414 866 194 109 311 0.936 1 710 995
it-2004 41 291 594 1 150 725 436 1.536 30 132 872
sk-2005 50 636 154 1 949 412 601 1.969 10 274 367
uk-2002 18 520 486 298 113 762 1.693 11 353 878
uk-2005 39 459 925 936 364 282 1.420 4 392 160
uk-2006-05 77 741 046 2 965 197 340 1.410 39 633 072
uk-2006-06 80 644 902 2 481 281 617 1.671 45 295 894
uk-2006-07 96 395 298 3 030 665 444 1.506 70 680 309
uk-2006-08 100 751 978 3 250 153 746 1.617 14 827 948
uk-2006-09 106 288 541 3 871 625 613 1.472 31 005 541
uk-2006-10 93 463 772 3 130 910 405 1.528 59 983 443
uk-2006-11 106 783 458 3 479 400 938 1.562 102 080 805
uk-2006-12 103 098 631 3 768 836 665 1.369 39 137 185
uk-2007-01 108 563 230 3 929 837 236 1.268 104 537 817
uk-2007-02 110 123 614 3 944 932 566 1.435 23 821 898
uk-2007-03 107 565 084 3 642 701 825 1.551 79 095 635
uk-2007-04 106 867 191 3 790 305 474 1.459 35 266 768
uk-2007-05 105 896 555 3 738 733 648 1.440 103 033 803
uk-union-2006-06-2007-05 133 633 040 5 507 679 822 1.830 40 996 664
webbase-2001 118 142 155 1 019 903 190 2.870 51 779 370


Bioinformatic


DMB

DMB (http://dmb.iasi.cnr.it) is a powerful data mining tool. It allows to classify many biological stuff (species, viruses, etc.) giving a set of logical formulas that are able to distinguish between the different classes given as input. Please refer to the DMB web-site for more information and the software.