splitter |
Please help by correcting and extending the Wiki pages.
splitter splits one or more input sequences into smaller, optionally overlapping, subsequences. The subsequence size and overlap (if any) may be specified. Optionally, feature information will be used.
Split a sequence into sub-sequences of 10,000 bases (the default size) with no overlap between the sub-sequences:
% splitter tembl:BA000025 ba000025.split Split sequence(s) into smaller sequences |
Go to the input files for this example
Go to the output files for this example
Example 2
Split a sequence into sub-sequences of 50,000 bases with an overlap of 3,000 bases on each sub-sequence:
% splitter tembl:BA000025 ba000025.split -size=50000 -over=3000 Split sequence(s) into smaller sequences |
Go to the output files for this example
Split sequence(s) into smaller sequences Version: EMBOSS:6.6.0.0 Standard (Mandatory) qualifiers: [-sequence] seqall Sequence(s) filename and optional format, or reference (input USA) [-outseq] seqoutall [ |
Qualifier | Type | Description | Allowed values | Default |
---|---|---|---|---|
Standard (Mandatory) qualifiers | ||||
[-sequence] (Parameter 1) |
seqall | Sequence(s) filename and optional format, or reference (input USA) | Readable sequence(s) | Required |
[-outseq] (Parameter 2) |
seqoutall | Sequence set(s) filename and optional format (output USA) | Writeable sequence(s) | <*>.format |
Additional (Optional) qualifiers | ||||
-size | integer | Size to split at | Integer 1 or more | 10000 |
-overlap | integer | Overlap between split sequences | Integer 0 or more | 0 |
Advanced (Unprompted) qualifiers | ||||
-feature | boolean | Use feature information | Boolean value Yes/No | No |
-addoverlap | boolean | Include overlap in output sequence size | Boolean value Yes/No | No |
Associated qualifiers | ||||
"-sequence" associated seqall qualifiers | ||||
-sbegin1 -sbegin_sequence |
integer | Start of each sequence to be used | Any integer value | 0 |
-send1 -send_sequence |
integer | End of each sequence to be used | Any integer value | 0 |
-sreverse1 -sreverse_sequence |
boolean | Reverse (if DNA) | Boolean value Yes/No | N |
-sask1 -sask_sequence |
boolean | Ask for begin/end/reverse | Boolean value Yes/No | N |
-snucleotide1 -snucleotide_sequence |
boolean | Sequence is nucleotide | Boolean value Yes/No | N |
-sprotein1 -sprotein_sequence |
boolean | Sequence is protein | Boolean value Yes/No | N |
-slower1 -slower_sequence |
boolean | Make lower case | Boolean value Yes/No | N |
-supper1 -supper_sequence |
boolean | Make upper case | Boolean value Yes/No | N |
-scircular1 -scircular_sequence |
boolean | Sequence is circular | Boolean value Yes/No | N |
-squick1 -squick_sequence |
boolean | Read id and sequence only | Boolean value Yes/No | N |
-sformat1 -sformat_sequence |
string | Input sequence format | Any string | |
-iquery1 -iquery_sequence |
string | Input query fields or ID list | Any string | |
-ioffset1 -ioffset_sequence |
integer | Input start position offset | Any integer value | 0 |
-sdbname1 -sdbname_sequence |
string | Database name | Any string | |
-sid1 -sid_sequence |
string | Entryname | Any string | |
-ufo1 -ufo_sequence |
string | UFO features | Any string | |
-fformat1 -fformat_sequence |
string | Features format | Any string | |
-fopenfile1 -fopenfile_sequence |
string | Features file name | Any string | |
"-outseq" associated seqoutall qualifiers | ||||
-osformat2 -osformat_outseq |
string | Output seq format | Any string | |
-osextension2 -osextension_outseq |
string | File name extension | Any string | |
-osname2 -osname_outseq |
string | Base file name | Any string | |
-osdirectory2 -osdirectory_outseq |
string | Output directory | Any string | |
-osdbname2 -osdbname_outseq |
string | Database name to add | Any string | |
-ossingle2 -ossingle_outseq |
boolean | Separate file for each entry | Boolean value Yes/No | N |
-oufo2 -oufo_outseq |
string | UFO features | Any string | |
-offormat2 -offormat_outseq |
string | Features format | Any string | |
-ofname2 -ofname_outseq |
string | Features file name | Any string | |
-ofdirectory2 -ofdirectory_outseq |
string | Output directory | Any string | |
General qualifiers | ||||
-auto | boolean | Turn off prompts | Boolean value Yes/No | N |
-stdout | boolean | Write first file to standard output | Boolean value Yes/No | N |
-filter | boolean | Read first file from standard input, write first file to standard output | Boolean value Yes/No | N |
-options | boolean | Prompt for standard and additional values | Boolean value Yes/No | N |
-debug | boolean | Write debug output to program.dbg | Boolean value Yes/No | N |
-verbose | boolean | Report some/full command line options | Boolean value Yes/No | Y |
-help | boolean | Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose | Boolean value Yes/No | N |
-warning | boolean | Report warnings | Boolean value Yes/No | Y |
-error | boolean | Report errors | Boolean value Yes/No | Y |
-fatal | boolean | Report fatal errors | Boolean value Yes/No | Y |
-die | boolean | Report dying program messages | Boolean value Yes/No | Y |
-version | boolean | Report version number and exit | Boolean value Yes/No | N |
The input is a standard EMBOSS sequence query (also known as a 'USA').
Major sequence database sources defined as standard in EMBOSS installations include srs:embl, srs:uniprot and ensembl
Data can also be read from sequence output in any supported format written by an EMBOSS or third-party application.
The input format can be specified by using the command-line qualifier -sformat xxx, where 'xxx' is replaced by the name of the required format. The available format names are: gff (gff3), gff2, embl (em), genbank (gb, refseq), ddbj, refseqp, pir (nbrf), swissprot (swiss, sw), dasgff and debug.
See: http://emboss.sf.net/docs/themes/SequenceFormats.html for further information on sequence formats.
ID BA000025; SV 2; linear; genomic DNA; STD; HUM; 2229817 BP. XX AC BA000025; AP000502-AP000521; XX DT 09-DEC-2004 (Rel. 82, Created) DT 17-JUN-2008 (Rel. 96, Last updated, Version 5) XX DE Homo sapiens genomic DNA, chromosome 6p21.3, HLA Class I region. XX KW . XX OS Homo sapiens (human) OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; OC Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; OC Homo. XX RN [1] RP 1-2229817 RA Hirakawa M., Yamaguchi H., Imai K., Shimada J.; RT ; RL Submitted (21-AUG-2001) to the INSDC. RL Mika Hirakawa, Japan Science and Technology Corporation (JST), Advanced RL Databases Department; 5-3, Yonbancho, Chiyoda-ku, Tokyo 102-0081, Japan RL (E-mail:mika@tokyo.jst.go.jp, URL:http://www-alis.tokyo.jst.go.jp/, RL Tel:81-3-5214-8491, Fax:81-3-5214-8470) XX RN [2] RA Shiina S., Tamiya G., Oka A., Inoko H.; RT "Homo sapiens 2,229,817bp genomic DNA of 6p21.3 HLA class I region"; RL Unpublished. XX DR EPD; EP11158; HS_TNF. DR EPD; EP11159; HS_LTA. DR EPD; EP73522; HS_HLA-B. DR EPD; EP73908; HS_GTF2H4. DR EPD; EP73940; HS_NEU1. DR EPD; EP74013; HS_VARS2. DR EPD; EP74203; HS_MRPS18B. DR EPD; EP74346; HS_HLA-E. DR EPD; EP74389; HS_BAT1. DR EPD; EP74485; HS_IER3. DR Ensembl-Gn; ENSG00000096155; Homo_sapiens. DR Ensembl-Gn; ENSG00000096171; Homo_sapiens. DR Ensembl-Gn; ENSG00000111971; Homo_sapiens. DR Ensembl-Gn; ENSG00000137310; Homo_sapiens. DR Ensembl-Gn; ENSG00000137312; Homo_sapiens. DR Ensembl-Gn; ENSG00000137313; Homo_sapiens. DR Ensembl-Gn; ENSG00000137331; Homo_sapiens. DR Ensembl-Gn; ENSG00000137332; Homo_sapiens. DR Ensembl-Gn; ENSG00000137337; Homo_sapiens. [Part of this file has been deleted for brevity] ttggccccac cccagcatgt ctccaggttc ctctcagccc tggttccttt tggccctgca 2226900 gtcacaatgg gcaacactgt gacgcaccct gtcctgtgtc acagtgtcat acactcaggc 2226960 tcacattgcc cctaggccac ttgccagcca agggacatgg ccacattttg tgtcttctgc 2227020 acctcagcct tgctttcaag tgcaggtgat gatggcaccc acgcagaaca aatgttattt 2227080 gctatcttcg tcgagtttag tcatccaatt ttccaaccct cactgggcaa ggaagagtgt 2227140 ggtttccacc aagaaggcag gatgtcagca gtcacagggg caaccaacag ggaaagccgc 2227200 cggaaaatag accccacagg aagcacaggt gtccagtgga gatgggaacc ctgcagattt 2227260 gaccgtcttt aagcagatta gagagattac cgttactaac aacttagcca taaaagttta 2227320 ttagctattt tcaaaaagca taaaattatg taatataatt ttttttaaat ttccatcaat 2227380 acaaaactaa tctgggcact gcaacttccg gtgggcaact gggataggcg gcatcatcag 2227440 gaaggcgagc cctgccgtgc cccatgtgcc agtgccccag atggcggcag cctccccaga 2227500 agcaccttgt atctcccctg cacagggcca gggtcccagc ttcccataca ccttctcctg 2227560 ctttttcttt tctgtccttt cctttttcaa taaaccacct gcaaaaaggg aaaaccattc 2227620 tgaggacaag aaacatgtca atgggaaata cacagttgcc agagggtaaa aggccctgtt 2227680 cattctcatt gaaaagctca ggtatttctg ttaaagtctc tccttttact ttaggatgct 2227740 gactcctgcg tccatctcaa cctgggcatc gtgccaccac cttcaagaag agaaaaacta 2227800 agtagtgctt tgcaaagggg cagcagcatt tctcatttct gaccatgtca ggcacatggc 2227860 catgcagatg agcaggtggg ggacacaggt gagtctccag acctgctctc ctcccacagt 2227920 acattcttga gtctttttaa acagttgtga aaatgccaca gatgcaagca cctgtgggcc 2227980 actcccatgg ggaccgttgc acaaggcagt gccactcatt ctcagaacct cctaccatgg 2228040 gctatgctta gtgacccgag gccaagccaa ggaagacgcc agccacaggg tgccatcctc 2228100 aggggcatgc tgccagcagg ggcaaagtta tccctagcaa caagatacag aaagaaagaa 2228160 aaaaggaagg aaatgtagcc aatgggccgg ttcaggttct tgactttgcc acacaaaaga 2228220 atttgagagc aagtccaaag taaaagtcag caagagaatt tattgcaaag tgaaagtaca 2228280 ctctgacagc tgatcagagc agctgctcaa aagagagaca gtaccctccc ctcacgggag 2228340 tcttacatga ttattcatga ataggtggga aggggtattg ttttaagcat gttctgtggt 2228400 ctcttgaacg tgcatgcact gtggttgtac atatcagcac acacatctta cgtctcatta 2228460 gcatcttaac ttccctctca gagttgtgtt tgctactatt gtaatgagca taggtcagcc 2228520 caaggacact attcatgggt ttctgggctt cctcagatgt ggggatgcct cccttggctc 2228580 ttctacctct ttgctgcagg atgttctaac cacaagccca ggatatggtt tgcgcactgt 2228640 cgaacagctt gttctctcca tcaacctgac aagtctcttg tttcctttca agggaggctg 2228700 tgaacaccct atctcactga cctcagaagg acagtacagc agtagccacc atgaccaaaa 2228760 agatgattcc agaagtgcag gacaactccc tacccagagg ctgtggctgt gcagtaacac 2228820 accaagaggg gagtccagct ggctctcagg gtgctcacta ccctcatctg ggggcctgga 2228880 ggacgtcaat tcctgagaac gccacgttct agtgagtaga atgaactgag agatacacag 2228940 caaagctcca catacttttc cttttctttg tgcccgcagt gttcttcatc agtgtgctct 2229000 cgcttttcag ctactactgt tggctggctg gaaaaaatag aacaatagta aaaattagag 2229060 accagtcttt ggtgatgaag agaaatattg gctacttcca gtattttcta gctttggtta 2229120 tggttgcagt tttccagctc accttgtggg gatgaattca gaaaaaagtt acaaattgaa 2229180 atgaacatgc cagaagtatt ggctcaaatc aacgttgtcc tattaagcca cttagtgaat 2229240 caaaagaccg cttgttggac tgttaatctc ggtggccaga gaaaggagct gaagaaggtg 2229300 ttgccagatc aggaacaaat aattacagcg gcaatagaaa atggaagacc acttgttcat 2229360 aaccatttga ataagggcaa ggtgtatgga aacacattat gaactgatat tttcagtttt 2229420 gtttgcaaga aaatgattaa taaggtgaaa taattgaagt atcacggaag atacattaaa 2229480 aaaaaaaaaa gcctttgtac agtttgctgg agccacagat gtcctactcc agagcagaac 2229540 aatgcctgaa tcttcagggt ccatttctgc cgcattcact agcaaccaca aatgtgactt 2229600 aattttactt tggaaataat gcttacccat tgtgagatgc tgtaatatga accatcatta 2229660 catgttaaca tggcacatgg aattttgagt gtctaagtta catttttaga gttgtttctt 2229720 agtagccatg tgagtttcca ctccaaaaac acaagctaaa aacttgtttt gagtgaagga 2229780 catctagggc aaatggtggc tgaaagtgaa tgagatc 2229817 // |
The output is a standard EMBOSS sequence file.
The results can be output in one of several styles by using the command-line qualifier -osformat xxx, where 'xxx' is replaced by the name of the required format. The available format names are: embl, genbank, gff, pir, swiss, dasgff, debug, listfile, dbmotif, diffseq, excel, feattable, motif, nametable, regions, seqtable, simple, srs, table, tagseq.
See: http://emboss.sf.net/docs/themes/SequenceFormats.html for further information on sequence formats.
>BA000025_1-10000 Homo sapiens genomic DNA, chromosome 6p21.3, HLA Class I region. gatctccagagcactcttccctgcagggcaccctcccatcccagactccaggcacctggc atgggtggacatctttactttctgggccagcttcagcagagctatgtcatcaccatagaa ctccaggattccctggttctttttggcaaagacatcaaaccctggggagatcaccgcctt ctcaataaggaattctttgccccactgggatttggggtctcctggaaatgatacactaga ttaggctagaccagggctcctgcaggggccagaggctgggtgaggtggtaggatctgtgg cttcaggatcaggaggctggtgcatcccctgccttacccacattgaccctccacagggag tggtcgttgccatcgcggaagcaatgagctgctgtcaggacccattggtcggagatgagg gccccccggcaggtctcttggctcttgggctgcaggggaacaggtgattttcagagattg cagtatgtctggcccatggccgcttttacctctggaatccaagccctgcccctccttcct ggtaccttaatagtgacatgccagggtgtcctctcctggtcagaggcgtttgctgacatg ttccccaccccgcagatggtgtctgtgagcttggagacatctgtgggtgtgaggatcaga tggggaaggaggcaagtgaggggcactgtgtccaggttcccaacacgggcctctggcggg ctcctcaccatcctccccacaccaaggagggcaaagctcactcacccagcatatgttcaa agacctggtgcagagcctttgtgtcctgcagaatgaaggcatgcctctcaccatccttct tggaccctagctcattcagttctctccagtccacatccagcttgcccaccccgatggcat agatgtctggtggggaagagggaaatcaccagactcctgtggctttggggctaccccatg agacaggaggctgtcatctgaaactcactgtgtccaatcaagacctacatgagctggacc cctgcgtcctccccactgctacctgtctgccttcatttcctgccactccctgcccttcac tctcctgcagcacacagcctctttgaagttcctcaaatccataggcatggtcacacctca ggccctttgcccagctgtgcctctgcctagttcactcctcccccccagacttccacatgg ctcactttcgtacctttttaagtcttggctcaaatgtcaccttctcagtgaggccttccc tggtcttcctgtctaaaactgcaatgccccagacaaactttcatccccactttgggaggc aaggtgggaggatcccttgaagccagaagtttgagaccagcctgggcaacatggcaacac cccttagcttgtgtcacctaccacctgctgggttctatggttttcttatcctgtttattc cctgtaatggtggaattgtgtcccccagaaagatgtgttcgagtcctaatccccagtatc tgtgactttatttggaaaaagggtctttgcagatgtaatcaagttaagattaagtcatac tagattagggtgagctctaatccaatgactgaggtccttataagaagaggtaagccagag ccaggcgtggtggctcacacctgtaatcaccaggaggcggtggttgtggtgagccaagat cgcgccattgcactccagcctgggcaacaagagcaaaaccccgtctcaaaaaaaaaaaaa gaagaggtgagccgggcacggtggctcacacctgtaatcccagcactctgggaggctgag gcgggcagatcacgaggtcaggaattcaagaccagcctgaccaacatggtgaaaccctgt ctctactaaaaatacaaaaattagccagacatgctggcacacacctgtaatcccagctac tcaggaggctgaggcaggagaatcgcttgaaccgggaggcggatgttgcagtgagccgag attgcaccactgcactccagcctgggcaacagagcaagactccatctcaaaaaaaaaaaa aaaaaaaaaaagtgaactggctgggcatggtggtgactcatgcctgtaatcccggcagtt tttttgaggcgaaggcaggcagatcgccttgaggccaggagtttaagaccagcctagcca acatggcgagaccatgtctctactaaaaatacaaaaatttgccgggcatggtggcacatg cctgtaatcccagcttcttgggagactgaggcacgagaatcacctgaacccaggaggcag aggttacagtgagccgggatcccgccactgcactgcagcctgggcttctgggtgacagag cgagactctgtctcaaacaaatgaacagaaaaagaagaaaggaatttggacacaaagaca caggtagtgggtctcctatctatataagagaacagcatgtaatgacacagaggcacacac agaaaagaaggcgagttgaagacagaggcagagaatgggtttatgctgccgcaagccaag gttggagctgccggcagccggaaaaggcaggaaagaattcttcccaagagccttctgagg aagcacggccctgccaacaccttgatttcagacttctaacctccagaactgtaagaaaaa gaaattctgtgttctaagccacccaggtttgtggtagtttggtaagtacttttaaatgac tgaatgaatagaaagaactcagaacacaacatggaaactaaacctcagatctggtcttcc tctgtaaaaggtagcatctgggagaagggcctaaagccacgttttcccactggaggccct ggacccacacaacaggccgcgcctgtcctccgactgtggtgccagtcagaactgccctca gacagaccacagagtctactcctctcccagcctttgcaccccttgtggcccatttttgtt [Part of this file has been deleted for brevity] cctcggtctgtctccaccaggccctgtgagggtgggtggaggctctctccaagccctcgt ttggccccaccccagcatgtctccaggttcctctcagccctggttccttttggccctgca gtcacaatgggcaacactgtgacgcaccctgtcctgtgtcacagtgtcatacactcaggc tcacattgcccctaggccacttgccagccaagggacatggccacattttgtgtcttctgc acctcagccttgctttcaagtgcaggtgatgatggcacccacgcagaacaaatgttattt gctatcttcgtcgagtttagtcatccaattttccaaccctcactgggcaaggaagagtgt ggtttccaccaagaaggcaggatgtcagcagtcacaggggcaaccaacagggaaagccgc cggaaaatagaccccacaggaagcacaggtgtccagtggagatgggaaccctgcagattt gaccgtctttaagcagattagagagattaccgttactaacaacttagccataaaagttta ttagctattttcaaaaagcataaaattatgtaatataattttttttaaatttccatcaat acaaaactaatctgggcactgcaacttccggtgggcaactgggataggcggcatcatcag gaaggcgagccctgccgtgccccatgtgccagtgccccagatggcggcagcctccccaga agcaccttgtatctcccctgcacagggccagggtcccagcttcccatacaccttctcctg ctttttcttttctgtcctttcctttttcaataaaccacctgcaaaaagggaaaaccattc tgaggacaagaaacatgtcaatgggaaatacacagttgccagagggtaaaaggccctgtt cattctcattgaaaagctcaggtatttctgttaaagtctctccttttactttaggatgct gactcctgcgtccatctcaacctgggcatcgtgccaccaccttcaagaagagaaaaacta agtagtgctttgcaaaggggcagcagcatttctcatttctgaccatgtcaggcacatggc catgcagatgagcaggtgggggacacaggtgagtctccagacctgctctcctcccacagt acattcttgagtctttttaaacagttgtgaaaatgccacagatgcaagcacctgtgggcc actcccatggggaccgttgcacaaggcagtgccactcattctcagaacctcctaccatgg gctatgcttagtgacccgaggccaagccaaggaagacgccagccacagggtgccatcctc aggggcatgctgccagcaggggcaaagttatccctagcaacaagatacagaaagaaagaa aaaaggaaggaaatgtagccaatgggccggttcaggttcttgactttgccacacaaaaga atttgagagcaagtccaaagtaaaagtcagcaagagaatttattgcaaagtgaaagtaca ctctgacagctgatcagagcagctgctcaaaagagagacagtaccctcccctcacgggag tcttacatgattattcatgaataggtgggaaggggtattgttttaagcatgttctgtggt ctcttgaacgtgcatgcactgtggttgtacatatcagcacacacatcttacgtctcatta gcatcttaacttccctctcagagttgtgtttgctactattgtaatgagcataggtcagcc caaggacactattcatgggtttctgggcttcctcagatgtggggatgcctcccttggctc ttctacctctttgctgcaggatgttctaaccacaagcccaggatatggtttgcgcactgt cgaacagcttgttctctccatcaacctgacaagtctcttgtttcctttcaagggaggctg tgaacaccctatctcactgacctcagaaggacagtacagcagtagccaccatgaccaaaa agatgattccagaagtgcaggacaactccctacccagaggctgtggctgtgcagtaacac accaagaggggagtccagctggctctcagggtgctcactaccctcatctgggggcctgga ggacgtcaattcctgagaacgccacgttctagtgagtagaatgaactgagagatacacag caaagctccacatacttttccttttctttgtgcccgcagtgttcttcatcagtgtgctct cgcttttcagctactactgttggctggctggaaaaaatagaacaatagtaaaaattagag accagtctttggtgatgaagagaaatattggctacttccagtattttctagctttggtta tggttgcagttttccagctcaccttgtggggatgaattcagaaaaaagttacaaattgaa atgaacatgccagaagtattggctcaaatcaacgttgtcctattaagccacttagtgaat caaaagaccgcttgttggactgttaatctcggtggccagagaaaggagctgaagaaggtg ttgccagatcaggaacaaataattacagcggcaatagaaaatggaagaccacttgttcat aaccatttgaataagggcaaggtgtatggaaacacattatgaactgatattttcagtttt gtttgcaagaaaatgattaataaggtgaaataattgaagtatcacggaagatacattaaa aaaaaaaaaagcctttgtacagtttgctggagccacagatgtcctactccagagcagaac aatgcctgaatcttcagggtccatttctgccgcattcactagcaaccacaaatgtgactt aattttactttggaaataatgcttacccattgtgagatgctgtaatatgaaccatcatta catgttaacatggcacatggaattttgagtgtctaagttacatttttagagttgtttctt agtagccatgtgagtttccactccaaaaacacaagctaaaaacttgttttgagtgaagga catctagggcaaatggtggctgaaagtgaatgagatc |
>BA000025_1-50000 Homo sapiens genomic DNA, chromosome 6p21.3, HLA Class I region. gatctccagagcactcttccctgcagggcaccctcccatcccagactccaggcacctggc atgggtggacatctttactttctgggccagcttcagcagagctatgtcatcaccatagaa ctccaggattccctggttctttttggcaaagacatcaaaccctggggagatcaccgcctt ctcaataaggaattctttgccccactgggatttggggtctcctggaaatgatacactaga ttaggctagaccagggctcctgcaggggccagaggctgggtgaggtggtaggatctgtgg cttcaggatcaggaggctggtgcatcccctgccttacccacattgaccctccacagggag tggtcgttgccatcgcggaagcaatgagctgctgtcaggacccattggtcggagatgagg gccccccggcaggtctcttggctcttgggctgcaggggaacaggtgattttcagagattg cagtatgtctggcccatggccgcttttacctctggaatccaagccctgcccctccttcct ggtaccttaatagtgacatgccagggtgtcctctcctggtcagaggcgtttgctgacatg ttccccaccccgcagatggtgtctgtgagcttggagacatctgtgggtgtgaggatcaga tggggaaggaggcaagtgaggggcactgtgtccaggttcccaacacgggcctctggcggg ctcctcaccatcctccccacaccaaggagggcaaagctcactcacccagcatatgttcaa agacctggtgcagagcctttgtgtcctgcagaatgaaggcatgcctctcaccatccttct tggaccctagctcattcagttctctccagtccacatccagcttgcccaccccgatggcat agatgtctggtggggaagagggaaatcaccagactcctgtggctttggggctaccccatg agacaggaggctgtcatctgaaactcactgtgtccaatcaagacctacatgagctggacc cctgcgtcctccccactgctacctgtctgccttcatttcctgccactccctgcccttcac tctcctgcagcacacagcctctttgaagttcctcaaatccataggcatggtcacacctca ggccctttgcccagctgtgcctctgcctagttcactcctcccccccagacttccacatgg ctcactttcgtacctttttaagtcttggctcaaatgtcaccttctcagtgaggccttccc tggtcttcctgtctaaaactgcaatgccccagacaaactttcatccccactttgggaggc aaggtgggaggatcccttgaagccagaagtttgagaccagcctgggcaacatggcaacac cccttagcttgtgtcacctaccacctgctgggttctatggttttcttatcctgtttattc cctgtaatggtggaattgtgtcccccagaaagatgtgttcgagtcctaatccccagtatc tgtgactttatttggaaaaagggtctttgcagatgtaatcaagttaagattaagtcatac tagattagggtgagctctaatccaatgactgaggtccttataagaagaggtaagccagag ccaggcgtggtggctcacacctgtaatcaccaggaggcggtggttgtggtgagccaagat cgcgccattgcactccagcctgggcaacaagagcaaaaccccgtctcaaaaaaaaaaaaa gaagaggtgagccgggcacggtggctcacacctgtaatcccagcactctgggaggctgag gcgggcagatcacgaggtcaggaattcaagaccagcctgaccaacatggtgaaaccctgt ctctactaaaaatacaaaaattagccagacatgctggcacacacctgtaatcccagctac tcaggaggctgaggcaggagaatcgcttgaaccgggaggcggatgttgcagtgagccgag attgcaccactgcactccagcctgggcaacagagcaagactccatctcaaaaaaaaaaaa aaaaaaaaaaagtgaactggctgggcatggtggtgactcatgcctgtaatcccggcagtt tttttgaggcgaaggcaggcagatcgccttgaggccaggagtttaagaccagcctagcca acatggcgagaccatgtctctactaaaaatacaaaaatttgccgggcatggtggcacatg cctgtaatcccagcttcttgggagactgaggcacgagaatcacctgaacccaggaggcag aggttacagtgagccgggatcccgccactgcactgcagcctgggcttctgggtgacagag cgagactctgtctcaaacaaatgaacagaaaaagaagaaaggaatttggacacaaagaca caggtagtgggtctcctatctatataagagaacagcatgtaatgacacagaggcacacac agaaaagaaggcgagttgaagacagaggcagagaatgggtttatgctgccgcaagccaag gttggagctgccggcagccggaaaaggcaggaaagaattcttcccaagagccttctgagg aagcacggccctgccaacaccttgatttcagacttctaacctccagaactgtaagaaaaa gaaattctgtgttctaagccacccaggtttgtggtagtttggtaagtacttttaaatgac tgaatgaatagaaagaactcagaacacaacatggaaactaaacctcagatctggtcttcc tctgtaaaaggtagcatctgggagaagggcctaaagccacgttttcccactggaggccct ggacccacacaacaggccgcgcctgtcctccgactgtggtgccagtcagaactgccctca gacagaccacagagtctactcctctcccagcctttgcaccccttgtggcccatttttgtt [Part of this file has been deleted for brevity] ggagaggggcaggtgcccctcctcggtctgtctccaccaggccctgtgagggtgggtgga ggctctctccaagccctcgtttggccccaccccagcatgtctccaggttcctctcagccc tggttccttttggccctgcagtcacaatgggcaacactgtgacgcaccctgtcctgtgtc acagtgtcatacactcaggctcacattgcccctaggccacttgccagccaagggacatgg ccacattttgtgtcttctgcacctcagccttgctttcaagtgcaggtgatgatggcaccc acgcagaacaaatgttatttgctatcttcgtcgagtttagtcatccaattttccaaccct cactgggcaaggaagagtgtggtttccaccaagaaggcaggatgtcagcagtcacagggg caaccaacagggaaagccgccggaaaatagaccccacaggaagcacaggtgtccagtgga gatgggaaccctgcagatttgaccgtctttaagcagattagagagattaccgttactaac aacttagccataaaagtttattagctattttcaaaaagcataaaattatgtaatataatt ttttttaaatttccatcaatacaaaactaatctgggcactgcaacttccggtgggcaact gggataggcggcatcatcaggaaggcgagccctgccgtgccccatgtgccagtgccccag atggcggcagcctccccagaagcaccttgtatctcccctgcacagggccagggtcccagc ttcccatacaccttctcctgctttttcttttctgtcctttcctttttcaataaaccacct gcaaaaagggaaaaccattctgaggacaagaaacatgtcaatgggaaatacacagttgcc agagggtaaaaggccctgttcattctcattgaaaagctcaggtatttctgttaaagtctc tccttttactttaggatgctgactcctgcgtccatctcaacctgggcatcgtgccaccac cttcaagaagagaaaaactaagtagtgctttgcaaaggggcagcagcatttctcatttct gaccatgtcaggcacatggccatgcagatgagcaggtgggggacacaggtgagtctccag acctgctctcctcccacagtacattcttgagtctttttaaacagttgtgaaaatgccaca gatgcaagcacctgtgggccactcccatggggaccgttgcacaaggcagtgccactcatt ctcagaacctcctaccatgggctatgcttagtgacccgaggccaagccaaggaagacgcc agccacagggtgccatcctcaggggcatgctgccagcaggggcaaagttatccctagcaa caagatacagaaagaaagaaaaaaggaaggaaatgtagccaatgggccggttcaggttct tgactttgccacacaaaagaatttgagagcaagtccaaagtaaaagtcagcaagagaatt tattgcaaagtgaaagtacactctgacagctgatcagagcagctgctcaaaagagagaca gtaccctcccctcacgggagtcttacatgattattcatgaataggtgggaaggggtattg ttttaagcatgttctgtggtctcttgaacgtgcatgcactgtggttgtacatatcagcac acacatcttacgtctcattagcatcttaacttccctctcagagttgtgtttgctactatt gtaatgagcataggtcagcccaaggacactattcatgggtttctgggcttcctcagatgt ggggatgcctcccttggctcttctacctctttgctgcaggatgttctaaccacaagccca ggatatggtttgcgcactgtcgaacagcttgttctctccatcaacctgacaagtctcttg tttcctttcaagggaggctgtgaacaccctatctcactgacctcagaaggacagtacagc agtagccaccatgaccaaaaagatgattccagaagtgcaggacaactccctacccagagg ctgtggctgtgcagtaacacaccaagaggggagtccagctggctctcagggtgctcacta ccctcatctgggggcctggaggacgtcaattcctgagaacgccacgttctagtgagtaga atgaactgagagatacacagcaaagctccacatacttttccttttctttgtgcccgcagt gttcttcatcagtgtgctctcgcttttcagctactactgttggctggctggaaaaaatag aacaatagtaaaaattagagaccagtctttggtgatgaagagaaatattggctacttcca gtattttctagctttggttatggttgcagttttccagctcaccttgtggggatgaattca gaaaaaagttacaaattgaaatgaacatgccagaagtattggctcaaatcaacgttgtcc tattaagccacttagtgaatcaaaagaccgcttgttggactgttaatctcggtggccaga gaaaggagctgaagaaggtgttgccagatcaggaacaaataattacagcggcaatagaaa atggaagaccacttgttcataaccatttgaataagggcaaggtgtatggaaacacattat gaactgatattttcagttttgtttgcaagaaaatgattaataaggtgaaataattgaagt atcacggaagatacattaaaaaaaaaaaaagcctttgtacagtttgctggagccacagat gtcctactccagagcagaacaatgcctgaatcttcagggtccatttctgccgcattcact agcaaccacaaatgtgacttaattttactttggaaataatgcttacccattgtgagatgc tgtaatatgaaccatcattacatgttaacatggcacatggaattttgagtgtctaagtta catttttagagttgtttcttagtagccatgtgagtttccactccaaaaacacaagctaaa aacttgttttgagtgaaggacatctagggcaaatggtggctgaaagtgaatgagatc |
The names of the sequences are the same as the original sequence, with '_start-end' appended, where 'start', and 'end' are the start and end positions of the sub-sequence. eg: The name U01317 would be changed in the sub-sequences to: U01317_1-50000 and U01317_50001-73308 if they were split at the size of 50000 with no overlap.
Splitting a large sequence into smaller sub-sequences for analysis might be useful in cases where a particularly memory or CPU intensive application will not run quickly enough or at all on the full sequence. This should seldom be necessary in EMBOSS.
By default, splitter will write all the sub-sequences to a single file. In some cases, particularly where non-EMBOSS programs are used, it is necessary to have a single sequence per file. To write the sub-sequences into separate files use the command-line switch -ossingle.
Program name | Description |
---|---|
aligncopy | Read and write alignments |
aligncopypair | Read and write pairs from alignments |
biosed | Replace or delete sequence sections |
codcopy | Copy and reformat a codon usage table |
cutseq | Remove a section from a sequence |
degapseq | Remove non-alphabetic (e.g. gap) characters from sequences |
descseq | Alter the name or description of a sequence |
entret | Retrieve sequence entries from flatfile databases and files |
extractalign | Extract regions from a sequence alignment |
extractfeat | Extract features from sequence(s) |
extractseq | Extract regions from a sequence |
featcopy | Read and write a feature table |
featmerge | Merge two overlapping feature tables |
featreport | Read and write a feature table |
feattext | Return a feature table original text |
listor | Write a list file of the logical OR of two sets of sequences |
makenucseq | Create random nucleotide sequences |
makeprotseq | Create random protein sequences |
maskambignuc | Mask all ambiguity characters in nucleotide sequences with N |
maskambigprot | Mask all ambiguity characters in protein sequences with X |
maskfeat | Write a sequence with masked features |
maskseq | Write a sequence with masked regions |
newseq | Create a sequence file from a typed-in sequence |
nohtml | Remove mark-up (e.g. HTML tags) from an ASCII text file |
noreturn | Remove carriage return from ASCII files |
nospace | Remove whitespace from an ASCII text file |
notab | Replace tabs with spaces in an ASCII text file |
notseq | Write to file a subset of an input stream of sequences |
nthseq | Write to file a single sequence from an input stream of sequences |
nthseqset | Read and write (return) one set of sequences from many |
pasteseq | Insert one sequence into another |
revseq | Reverse and complement a nucleotide sequence |
seqcount | Read and count sequences |
seqret | Read and write (return) sequences |
seqretsetall | Read and write (return) many sets of sequences |
seqretsplit | Read sequences and write them to individual files |
sizeseq | Sort sequences by size |
skipredundant | Remove redundant sequences from an input set |
skipseq | Read and write (return) sequences, skipping first few |
splitsource | Split sequence(s) into original source sequences |
trimest | Remove poly-A tails from nucleotide sequences |
trimseq | Remove unwanted characters from start and end of sequence(s) |
trimspace | Remove extra whitespace from an ASCII text file |
union | Concatenate multiple sequences into a single sequence |
vectorstrip | Remove vectors from the ends of nucleotide sequence(s) |
yank | Add a sequence reference (a full USA) to a list file |
Please report all bugs to the EMBOSS bug team (emboss-bug © emboss.open-bio.org) not to the original author.