RFAM version 7.0 and EMBL version r88 was used to construct the benchmark. All sequence datasets are in FASTA format. Each file in the benchmark begins with an RNA family name, a unique number of the alignment, and a file suffix indicating the contents. ------------------ Benchmark without query RNA flanks: The general format of files in this distribution is Filename format ..flankunaligned The query RNA and the genomic sequence containing the target, both unaligned without gaps ..flankaligned The query RNA aligned to the target RNA in the genomic sequence surrounded by flanking endgaps. ..seq The genomic sequence of the pair above, without gaps ..startend The start and end indices of the target RNA embedded in the genomic sequence. Indices start at 0 and end at length-1. ------------------- Benchmark with query RNA flanks: The general format of files in this distribution is Filename format ..gengen5 The query RNA, with 5' and 3' flanks of 50 nucleotides, and the genomic sequence containing the target, both unaligned without gaps ..query_startend5 The start and end indices of the query RNA embedded in the flanks. Indices start at 0 and end at length-1. ..gengen4 The query RNA, with 5' and 3' flanks of 100 nucleotides, and the genomic sequence containing the target, both unaligned without gaps ..query_startend4 The start and end indices of the query RNA embedded in the flanks. Indices start at 0 and end at length-1. ..gengen6 The query RNA, with 5' and 3' flanks of 150 nucleotides, and the genomic sequence containing the target, both unaligned without gaps ..query_startend6 The start and end indices of the query RNA embedded in the flanks. Indices start at 0 and end at length-1. --------------------- False positive set: The general format of files in this distribution is Filename format ..flankunaligned The query RNA and a genomic sequence containing a target RNA from a different, both unaligned without gaps ..gengen5 The query RNA, with 5' and 3' genomic flanks of 50 nucleotides, and a genomic sequence containing a target RNA from a different family, both unaligned without gaps ..gengen4 As above but query flanks of 100 nucleotides ..gengen6 As above but query flanks of 150 nucleotides ------------------- The RFAM_subsets.tgz file contains RFAM family alignments from which the benchmark was created. Each alignment is at most 50 sequences which were randomly selected from the original seed alignments which can contain as many as a 1000 sequences. Each file in RFAM_fixed is of the format .unaligned The RNA sequences in unaligned format. .aligned The RNA sequences in aligned format.