Welcome to kmerExtractor’s documentation!

A genome sequence is composed of four basic types of nucleotides: adenine (A), cytosine (C), guanine (G), and thymine (T). All possible nucleotide substrings of length k are called k-mers. For example, for \(k=2\) there are \(4^k=4^2=16\) different 2-mers corresponding to: AA, AC, AG, AT, CA, CC, CG, CT, GA, GC, GG, GT, TA, TC, TG, TT.

The kmerExtractor software calculates the k-mer frequencies for windows within the chromosomes of an organism’s sequence. It takes as input a FASTA file with the nucleotide sequence of an organism separated by chromosomes. The software splits the chromosomes into windows of the size specified by the user. For a given \(k\), the kmerExtractor computes the frequency of the k-mers in the different windows within the chromosomes. It returns a csv file indicating the chromosome, the window index, the start and end position of the window, and the frequencies of the corresponding k-mers.

Indices and tables