Wednesday, March 9, 2011

Sort 10 Tbs of data

Assume you have 100 tbs of data made up of fixed length records. How would you sort it?

2 comments:

  1. I know java so I would use hadoop infrastructure, or something similar for other languages.
    There are many pages about the problem, like this for example: http://developer.yahoo.com/blogs/hadoop/posts/2009/05/hadoop_sorts_a_petabyte_in_162/

    ReplyDelete
  2. External sorting (http://en.wikipedia.org/wiki/External_sorting) - if you have time but do not have machines :)

    ReplyDelete