dc.contributor.author | Röhling, Sophie | |
dc.contributor.author | Linne, Alexander | |
dc.contributor.author | Schellhorn, Jendrik | |
dc.contributor.author | Hosseini, Morteza | |
dc.contributor.author | Dencker, Thomas | |
dc.contributor.author | Morgenstern, Burkhard | |
dc.date.accessioned | 2020-02-11T08:39:45Z | |
dc.date.available | 2020-02-11T08:39:45Z | |
dc.date.issued | 2020 | de |
dc.relation.ISSN | 1932-6203 | de |
dc.identifier.uri | http://resolver.sub.uni-goettingen.de/purl?gs-1/17161 | |
dc.description.abstract | We study the number Nk of length-k word matches between pairs of evolutionarily related
DNA sequences, as a function of k. We show that the Jukes-Cantor distance between two
genome sequences—i.e. the number of substitutions per site that occurred since they
evolved from their last common ancestor—can be estimated from the slope of a function F
that depends on Nk and that is affine-linear within a certain range of k. Integers kmin and kmax
can be calculated depending on the length of the input sequences, such that the slope of F
in the relevant range can be estimated from the values F(kmin) and F(kmax). This approach
can be generalized to so-called Spaced-word Matches (SpaM), where mismatches are
allowed at positions specified by a user-defined binary pattern. Based on these theoretical
results, we implemented a prototype software program for alignment-free sequence comparison
called Slope-SpaM. Test runs on real and simulated sequence data show that
Slope-SpaM can accurately estimate phylogenetic distances for distances up to around 0.5
substitutions per position. The statistical stability of our results is improved if spaced words
are used instead of contiguous words. Unlike previous alignment-free methods that are
based on the number of (spaced) word matches, Slope-SpaM produces accurate results,
even if sequences share only local homologies. | de |
dc.description.sponsorship | Open-Access-Publikationsfonds 2020 | |
dc.language.iso | eng | de |
dc.rights | openAccess | |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | |
dc.subject | Sequence alignment; Phylogenetic analysis; Multiple alignment calculation; Phylogenetics; Plant genomics; Bacterial genomics; Molecular evolution; Nucleotide sequencing | de |
dc.subject.ddc | 570 | |
dc.title | The number of k-mer matches between two DNA sequences as a function of k and applications to estimate phylogenetic distances | de |
dc.type | journalArticle | de |
dc.identifier.doi | 10.1371/journal.pone.0228070 | |
dc.identifier.doi | 10.1371/journal.pone.0228070.g001 | |
dc.identifier.doi | 10.1371/journal.pone.0228070.g002 | |
dc.identifier.doi | 10.1371/journal.pone.0228070.g003 | |
dc.identifier.doi | 10.1371/journal.pone.0228070.g004 | |
dc.identifier.doi | 10.1371/journal.pone.0228070.g005 | |
dc.identifier.doi | 10.1371/journal.pone.0228070.t001 | |
dc.identifier.doi | 10.1371/journal.pone.0228070.t002 | |
dc.identifier.doi | 10.1371/journal.pone.0228070.s001 | |
dc.identifier.doi | 10.1371/journal.pone.0228070.r001 | |
dc.identifier.doi | 10.1371/journal.pone.0228070.r002 | |
dc.identifier.doi | 10.1371/journal.pone.0228070.r003 | |
dc.type.version | publishedVersion | de |
dc.relation.eISSN | 1932-6203 | |
dc.bibliographicCitation.volume | 15 | de |
dc.bibliographicCitation.issue | 2 | de |
dc.type.subtype | journalArticle | |
dc.bibliographicCitation.articlenumber | e0228070 | de |
dc.description.status | peerReviewed | de |
dc.bibliographicCitation.journal | PLOS ONE | de |