We were on Slashdot
yesterday (again!) for our Wikipedia web pack. The responses range from
surprise - "How can they squeeze it in 6GB?" - to quizzical - "Why
do they need 6GB?". Here is our attempt to shed some light on this issue.
The Wikipedia
database dump was downloaded in the first week of March. This dump contains
only text of the articles. As one of the comments on Slashdot pointed out, the
size of image
database of Wikipedia is 75.5 GB (!) as of November, 2005. This download size is clearly
unacceptable to the users. (No, We're not talking about those incredibly lucky
folks who use a small data center as desktop and have unhindered access to a T3
connection.)
The Wikipedia web pack occupies 6 GB of disk space when
uncompressed. A copy of wikipedia in zipped format is retained on the disk for
peer-to-peer sharing and optimizing download times for the community. This
zipped copy takes a shade lower than 4.5 GB.
Our Wikipedia web pack has more than 645,000 articles. We
offer a refreshingly new approach to access any of these 645,000 articles anywhere,
quickly – search offline. If you are
hesitant to commit 6 GB of disk space (and time to download), you can try out
one of the 100-plus topical
slices of Wikipedia.