dbxfasta |
Please help by correcting and extending the Wiki pages.
These indexes allow access of flat files larger than 2Gb.
% dbxfasta Index a fasta file database using b+tree indices Basename for index files: emrod Resource name: emblresource simple : >ID idacc : >ID ACC or >ID (ACC) idsv : >ID SV or >ID (SV) gcgid : >db:ID gcgidacc : >db:ID ACC dbid : >db ID ncbi : | formats ID line format [idacc]: idacc Database directory [.]: data Wildcard database filename [*.fasta]: emrod id : ID acc : Accession number sv : Sequence Version and GI des : Description Index fields [id,acc]: Compressed index files [Y]: General log output file [outfile.dbxfasta]: |
Go to the output files for this example
Index a fasta file database using b+tree indices Version: EMBOSS:6.6.0.0 Standard (Mandatory) qualifiers: [-dbname] string Basename for index files (Any string from 2 to 19 characters, matching regular expression /[A-z][A-z0-9_]+/) [-dbresource] string Resource name (Any string from 2 to 19 characters, matching regular expression /[A-z][A-z0-9_]+/) -idformat menu [idacc] ID line format (Values: simple (>ID); idacc (>ID ACC or >ID (ACC)); idsv (>ID SV or >ID (SV)); gcgid (>db:ID); gcgidacc (>db:ID ACC); dbid (>db ID); ncbi (| formats)) -directory directory [.] Database directory -filenames string [*.fasta] Wildcard database filename (Any string) -fields menu [id,acc] Index fields (Values: id (ID); acc (Accession number); sv (Sequence Version and GI); des (Description)) -[no]compressed boolean [Y] Compressed index files -outfile outfile [*.dbxfasta] General log output file Additional (Optional) qualifiers: (none) Advanced (Unprompted) qualifiers: -release string [0.0] Release number (Any string up to 9 characters) -date string [00/00/00] Index date (Date string dd/mm/yy) -exclude string Wildcard filename(s) to exclude (Any string) -statistics boolean [N] Report I/O statistics for each input file -indexoutdir outdir [.] Index file output directory Associated qualifiers: "-directory" associated qualifiers -extension string Default file extension "-indexoutdir" associated qualifiers -extension string Default file extension "-outfile" associated qualifiers -odirectory string Output directory General qualifiers: -auto boolean Turn off prompts -stdout boolean Write first file to standard output -filter boolean Read first file from standard input, write first file to standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report dying program messages -version boolean Report version number and exit |
Qualifier | Type | Description | Allowed values | Default | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Standard (Mandatory) qualifiers | ||||||||||||||||||
[-dbname] (Parameter 1) |
string | Basename for index files | Any string from 2 to 19 characters, matching regular expression /[A-z][A-z0-9_]+/ | Required | ||||||||||||||
[-dbresource] (Parameter 2) |
string | Resource name | Any string from 2 to 19 characters, matching regular expression /[A-z][A-z0-9_]+/ | Required | ||||||||||||||
-idformat | list | ID line format |
|
idacc | ||||||||||||||
-directory | directory | Database directory | Directory | . | ||||||||||||||
-filenames | string | Wildcard database filename | Any string | *.fasta | ||||||||||||||
-fields | list | Index fields |
|
id,acc | ||||||||||||||
-[no]compressed | boolean | Compressed index files | Boolean value Yes/No | Yes | ||||||||||||||
-outfile | outfile | General log output file | Output file | <*>.dbxfasta | ||||||||||||||
Additional (Optional) qualifiers | ||||||||||||||||||
(none) | ||||||||||||||||||
Advanced (Unprompted) qualifiers | ||||||||||||||||||
-release | string | Release number | Any string up to 9 characters | 0.0 | ||||||||||||||
-date | string | Index date | Date string dd/mm/yy | 00/00/00 | ||||||||||||||
-exclude | string | Wildcard filename(s) to exclude | Any string | |||||||||||||||
-statistics | boolean | Report I/O statistics for each input file | Boolean value Yes/No | No | ||||||||||||||
-indexoutdir | outdir | Index file output directory | Output directory | . | ||||||||||||||
Associated qualifiers | ||||||||||||||||||
"-directory" associated directory qualifiers | ||||||||||||||||||
-extension | string | Default file extension | Any string | |||||||||||||||
"-indexoutdir" associated outdir qualifiers | ||||||||||||||||||
-extension | string | Default file extension | Any string | |||||||||||||||
"-outfile" associated outfile qualifiers | ||||||||||||||||||
-odirectory | string | Output directory | Any string | |||||||||||||||
General qualifiers | ||||||||||||||||||
-auto | boolean | Turn off prompts | Boolean value Yes/No | N | ||||||||||||||
-stdout | boolean | Write first file to standard output | Boolean value Yes/No | N | ||||||||||||||
-filter | boolean | Read first file from standard input, write first file to standard output | Boolean value Yes/No | N | ||||||||||||||
-options | boolean | Prompt for standard and additional values | Boolean value Yes/No | N | ||||||||||||||
-debug | boolean | Write debug output to program.dbg | Boolean value Yes/No | N | ||||||||||||||
-verbose | boolean | Report some/full command line options | Boolean value Yes/No | Y | ||||||||||||||
-help | boolean | Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose | Boolean value Yes/No | N | ||||||||||||||
-warning | boolean | Report warnings | Boolean value Yes/No | Y | ||||||||||||||
-error | boolean | Report errors | Boolean value Yes/No | Y | ||||||||||||||
-fatal | boolean | Report fatal errors | Boolean value Yes/No | Y | ||||||||||||||
-die | boolean | Report dying program messages | Boolean value Yes/No | Y | ||||||||||||||
-version | boolean | Report version number and exit | Boolean value Yes/No | N |
Processing directory: /homes/user/test/data/ Processing file: emrod entries: 6 (6) time: 0.0/0.0s (-0.0/0.0) Total time: 0:00.0 Entry idlen 15 OK. Maximum ID length was 6 for 'L48662'. Field acc acclen 15 OK. Maximum acc term length was 6 for 'L48662'. |
# Number of files: 1 # Release: 0.0 # Date: 00/00/00 Single filename database emrod |
Type Identifier Compress Yes Pages 3 Secpages 0 Order 71 Fill 56 Level 0 Pagesize 2048 Cachesize 20000 Order2 22 Fill2 41 Secpagesize 512 Seccachesize 20000 Count 6 Fullcount 6 Kwlimit 15 Reffiles 0 |
Type Identifier Compress Yes Pages 3 Secpages 0 Order 71 Fill 56 Level 0 Pagesize 2048 Cachesize 20000 Order2 22 Fill2 41 Secpagesize 512 Seccachesize 20000 Count 6 Fullcount 6 Kwlimit 15 Reffiles 0 |
This file contains non-printing characters and so cannot be displayed here.
This file contains non-printing characters and so cannot be displayed here.
Having created the EMBOSS indexes for this file, a database can then be defined in the file emboss.defaults as something like:
DB emrod [ type: N dbalias: emrod (see below) format: fasta method: emboss directory: /data/embl/fasta file: emrod.fasta indexdirectory: /data/embl/fasta/indexes ]The index file 'basename' given to dbxfasta must match the DB name in the definition. If not, then a 'dbalias' line must be given which specifies the basename of the indexes.
SET PAGESIZE 2048 SET CACHESIZE 200The above values are recommended for most systems. The PAGESIZE is a multiple of the size of disc pages the operating system buffers. The CACHESIZE is the number of disc pages dbxfasta is allowed to cache.
RES embl [ type: Index idlen: 15 acclen: 15 svlen: 20 keylen: 25 deslen: 25 orglen: 25 ]The length definitions are the maximum lengths of 'words' in the field being indexed. Longer words will be truncated to the value set.
Program name | Description |
---|---|
dbiblast | Index a BLAST database |
dbifasta | Index a fasta file database |
dbiflat | Index a flat file database |
dbigcg | Index a GCG formatted database |
dbxcompress | Compress an uncompressed dbx index |
dbxedam | Index the EDAM ontology using b+tree indices |
dbxflat | Index a flat file database using b+tree indices |
dbxgcg | Index a GCG formatted database using b+tree indices |
dbxobo | Index an obo ontology using b+tree indices |
dbxreport | Validate index and report internals for dbx databases |
dbxresource | Index a data resource catalogue using b+tree indices |
dbxstat | Dump statistics for dbx databases |
dbxtax | Index NCBI taxonomy using b+tree indices |
dbxuncompress | Uncompress a compressed dbx index |
Please report all bugs to the EMBOSS bug team (emboss-bug © emboss.open-bio.org) not to the original author.