Monday, January 19, 2009

Human genomes as email attachments

A new paper by Scott Christley, Yiming Lu, Chen Li and Xiaohui Xie is available and talks about additional compression abilities available at the genome level.

They identified 4 ways to reduce the size:
1. Only use the number of bits needed;
2. Store relative instead of absolute positions;
3. Generalize variations and store the variations; and
4. Store non-random repeats as such.

If I get some time, I would really love to try out the code they provide in the article.

Human genomes as email attachments