Zur Kurzanzeige

Fast and accurate phylogeny reconstruction using filtered spaced-word matches.

dc.contributor.authorLeimeister, Chris-André
dc.contributor.authorSohrabi-Jahromi, Salma
dc.contributor.authorMorgenstern, Burkhard
dc.date.accessioned2017-07-27T06:54:12Z
dc.date.available2017-07-27T06:54:12Z
dc.date.issued2017-04-01de
dc.relation.ISSN1367-4811de
dc.identifier.urihttp://resolver.sub.uni-goettingen.de/purl?gs-1/14542
dc.description.abstractMotivation: Word-based or 'alignment-free' algorithms are increasingly used for phylogeny reconstruction and genome comparison, since they are much faster than traditional approaches that are based on full sequence alignments. Existing alignment-free programs, however, are less accurate than alignment-based methods. Results: We propose Filtered Spaced Word Matches (FSWM) , a fast alignment-free approach to estimate phylogenetic distances between large genomic sequences. For a pre-defined binary pattern of match and don't-care positions, FSWM rapidly identifies spaced word-matches between input sequences, i.e. gap-free local alignments with matching nucleotides at the match positions and with mismatches allowed at the don't-care positions. We then estimate the number of nucleotide substitutions per site by considering the nucleotides aligned at the don't-care positions of the identified spaced-word matches. To reduce the noise from spurious random matches, we use a filtering procedure where we discard all spaced-word matches for which the overall similarity between the aligned segments is below a threshold. We show that our approach can accurately estimate substitution frequencies even for distantly related sequences that cannot be analyzed with existing alignment-free methods; phylogenetic trees constructed with FSWM distances are of high quality. A program run on a pair of eukaryotic genomes of a few hundred Mb each takes a few minutes. Availability and Implementation: The program source code for FSWM including a documentation, as well as the software that we used to generate artificial genome sequences are freely available at http://fswm.gobics.de/. Contact: chris.leimeister@stud.uni-goettingen.de. Supplementary information: Supplementary data are available at Bioinformatics online.de
dc.languageeng
dc.language.isoengde
dc.rightsopenAccessen
dc.rights.urihttps://creativecommons.org/licenses/by-nc/4.0/
dc.subjectphylogeny reconstruction; spaced-word matchesde
dc.subject.ddc570en
dc.subject.meshAlgorithms
dc.subject.meshBase Sequence
dc.subject.meshComputer Simulation
dc.subject.meshGenome, Bacterial
dc.subject.meshGenome, Plant
dc.subject.meshGenomics
dc.subject.meshPhylogeny
dc.subject.meshSequence Alignment
dc.subject.meshSequence Analysis, DNA
dc.subject.meshSequence Homology, Nucleic Acid
dc.subject.meshSoftware
dc.subject.meshTime Factors
dc.titleFast and accurate phylogeny reconstruction using filtered spaced-word matches.de
dc.typejournalArticlede
dc.identifier.doi10.1093/bioinformatics/btw776
dc.type.versionpublishedVersionde
dc.bibliographicCitation.volume33de
dc.bibliographicCitation.issue7de
dc.bibliographicCitation.firstPage971de
dc.bibliographicCitation.lastPage979de
dc.type.subtypejournalArticleen
dc.identifier.pmid28073754
dc.description.statuspeerReviewedde
dc.bibliographicCitation.journalBioinformatics (Oxford, England)de


Dateien zu dieser Ressource

Thumbnail

Das Dokument erscheint in:

Zur Kurzanzeige

Nutzungslizenz für diese Dokumente:
openAccess