Are there Copious Sequence Errors in Databases?

A header to showing test tubes as banner image for sequebce error in database

DNA sequencing techniques have made great progress in the recent year. Despite high-performance machines there is always room for errors.  Fragmentation and chemical modification can occur in formalin-fixed DNA samples extracted from ancient specimens. These modifications create mutations that was previously absent in the living organism. This damage can occur in any DNA sample. In DNA sequencing sonication used to amplify the DNA fragments. This sound energy used to work up the DNA fragments encourages oxidative damage that causes mutations.

Identifying even rarest and the smallest of the mutation become important in cancer biology with emphasis sub-clonal mutations detection. Successful detection reduces false positives which are of at most importance in cancer genome projects.

To access the degree of such damage, a team of researchers at New England Biolabs (NEB) has developed an algorithm. Furthermore to rectify any damage the team suggests using DNA repair enzymes during sample preparation process. The algorithm calculates the extent of damage by comparing the first and second sequencing reads. The degrees of mismatching thymines determine the amount of damage. This Global Imbalance Value (GIV) algorithm available free on GitHub can be used in quality control. The GIV score of a sample acts as a benchmark for identifying potentially low-frequency variants.