Wednesday, August 25, 2010

ClueWeb 09

ClueWeb is a wonderful Web dataset available for the research community. I

  • 1,040,809,705 web pages, in 10 languages
  • 5 TB, compressed. (25 TB, uncompressed.)
  • Unique URLs: 4,780,950,903 (325 GB uncompressed, 105 GB compressed)
  • Total Outlinks: 7,944,351,835 (71 GB uncompressed, 24 GB compressed)

No comments:

Post a Comment