A new paper by Scott Christley, Yiming Lu, Chen Li and Xiaohui Xie is available and talks about additional compression abilities available at the genome level.
They identified 4 ways to reduce the size:
1. Only use the number of bits needed;
2. Store relative instead of absolute positions;
3. Generalize variations and store the variations; and
4. Store non-random repeats as such.
If I get some time, I would really love to try out the code they provide in the article.
Human genomes as email attachments
No comments:
Post a Comment