next up previous contents
Next: 4. Library Interface Up: Aspell .29.1 alpha A Previous: 2. Getting Started   Contents

Subsections

3. The Aspell utility

The Aspell utility is a multipurpose utility that can function as a ``ispell -a'' replacement, as an independent spell checker, and as a utility for managing dictionaries. Here is a brief summary of Aspell's command line options.

aspell [options] «command»

«command» is one of:

check «file»
to check a file
pipe
'ispell -a' compatible mode.
config
dump the current configuration to stdout
soundslike
returns the soundslike equivalent for each word entered
filter
passes standard input through the same set of filters that would be used to spell check a document.
help
display online help
version
prints a version line
dump|create|merge maser|personal|repl [word list]
dumps, creates, or merges a master, personal, or replacement word list.
[options] is any or all of the following standard aspell library options:

--conf=«file»
main configuration file
--conf-dir=«dir»
location of main configuration file
--data-dir=«dir»
location of language data files
--dict-dir=«dir»
location of the main word list
--add|rem-filter=«str»
add or removes a filter
--home-dir=«dir»
location for personal files
-W,--ignore=«integer»
ignore words <= n chars
--[dont-]ignore-repl
ignore commands to store replacement pairs
--lang=«str»
default language to use
-d,--master=«name»
main word list base name
--mode=«str»
sets the filter mode. Mode is one if none, url, email, sgml, or tex.
-e,--mode=email
enter Email mode.
-H,--mode=sgml
enter Html/Sgml mode.
-t,--mode=tex
enter TEX mode.
--per-conf=«file»
personal configuration file
-p,--personal=«file»
personal word list file name
--repl=«file»
replacements list file name
--sug-mode=«mode»
suggestion mode = fast | normal | bad-spellers
plus options to modify the behavior of the various filers:

--add|rem-email-quote=«char»
email quote characters
--email-margin=«integer»
num chars that can appear before the quote char
--add|rem-sgml-check=«str»
sgml tags to always check.
--add|rem-sgml-extension=«str»
sgml file extensions.
--add|rem-tex-command=«str»
TEX commands
--[dont-]tex-check-comments
check TEX comments
in addition to some aspell utility specific command:

-b,--backup
create a backup file by appending ``.bak'' to the file name. (Only applies when the command is check)
-x,--dont-backup
don't create a backup file.
--[dont-]time
time load time and suggest time in pipe mode.
--[dont-]reverse
reverse the order of the suggestions list.
In addition Aspell with try to make seance out of Ispell's command line options so that it can function as a drop in replacement for Ispell when used is ``-a'' mode.

If Aspell is specified with out any command line options it will display a brief help screen and quit.

Aspell can also make use of a global or user configuration file. Each line of the configuration file has the format:

«option» [args]
where option is any one of the standard library options above without the leading dashes. For example the following line will set the default language to German:

lang german
Anything from a ``#'' to a newline is ignored. The global configuration file is usually named ``aspell.conf'' and is found in the etc directory while the user configuration file is usually named ``.aspell.conf'' and is found in the users home directory. Use ``aspell dump config'' to found out what they are for your installation.

The environmental variable ASPELL_CONF may also be used and it overrides any options set in the configuration file. The format of the string is exactly the same as the configuration file except that semicolons ( ; ) are used instead of newlines.

3.1 As a ``ispell -a'' replacement

To actually use Aspell as an Ispell replacement simply follow the directions in section 2.6.

When given the pipe or -a command aspell goes into a pipe mode that is compatible with ``ispell -a''. Aspell also defines its own set of extensions to ispell pipe mode.


3.1.1 Format of the Data Stream

In this mode, Aspell prints a one-line version identification message, and then begins reading lines of input. For each input line, a single line is written to the standard output for each word checked for spelling on the line. If the word was found in the main dictionary, or your personal dictionary, then the line contains only a '*'.

If the word is not in the dictionary, but there are suggestions, then the line contains an '&', a space, the misspelled word, a space, the number of near misses, the number of characters between the beginning of the line and the beginning of the misspelled word, a colon, another space, and a list of the suggestions separated by commas and spaces.

Finally, if the word does not appear in the dictionary, and there are no suggestions, then the line contains a '#', a space, the misspelled word, a space, and the character offset from the beginning of the line. Each sentence of text input is terminated with an additional blank line, indicating that ispell has completed processing the input line.

These output lines can be summarized as follows:

OK:
*
Suggestions:
& «original» «count» «offset»: «miss», «miss», ...
None:
# «original» «offset»
When in the -a mode, Aspell will also accept lines of single words prefixed with any of '*', '&', '@', '+', '-', '~', '#', '!', '%', or '^'. A line starting with '*' tells ispell to insert the word into the user's dictionary. A line starting with '&' tells ispell to insert an all-lowercase version of the word into the user's dictionary. A line starting with '@' causes ispell to accept this word in the future. A line starting with '+', followed immediately by a valid mode will cause aspell to parse future input according the syntax of that formatter. A line consisting solely of a '+' will place ispell in TEX/LATEX mode (similar to the -t option) and '-' returns aspell to its default mode (but these commands are obsolete). A line '~', is ignored for ispell compatibility. A line prefixed with '#' will cause the personal dictionaries to be saved. A line prefixed with '!' will turn on terse mode (see below), and a line prefixed with '%' will return ispell to normal (non-terse) mode. Any input following the prefix characters '+', '-', '#', '!', '~', or '%' is ignored, as is any input following. To allow spell-checking of lines beginning with these characters, a line starting with '^' has that character removed before it is passed to the spell-checking code. It is recommended that programmatic interfaces prefix every data line with an uparrow to protect themselves against future changes in Aspell.

To summarize these:

*«word»
Add a word to the personal dictionary
&«word»
Insert the all-lowercase version of the word in the personal dictionary
@«word»
Accept the word, but leave it out of the dictionary
#
Save the current personal dictionary
~
Ignored for ispell compatibility.
+
Enter TEX mode.
+«mode»
Enter the mode specified by «mode».
-
Enter the default mode.
!
Enter terse mode
%
Exit terse mode
^
Spell-check the rest of the line
In terse mode, Aspell will not print lines beginning with '*', which indicate correct words. This significantly improves running speed when the driving program is going to ignore correct words anyway.

In addition to the above commands which are designed for Ispell compatibility Aspell also supports its own extension. All Aspell extensions follow the following format.

$$«command» [data]
Where data may or may not be required depending on the particular command. Aspell currently supports the following command.

m which
Print out the current mode
m none
Do not use any special mode.
m url
Enter email, URL, and host name skipping mode.
s «word1»,«word2»
Returns the score of the two words based roughly on how aspell would score them.
Sw «word»
Returns the soundlike equivalent of the word.
Sl «word»
Returns a list of words that have the same soundlike equivalent.
Pw «word»
Returns the phoneme equivalent of the word.
pp
Returns a list of all words in the current personal wordlist.
ps
Returns a list of all words in the current session dictionary.
l
Returns the current language name.
ra «mis»,«cor»
Add the word pair to the replacement dictionary for latter use. Returns nothing.
ric
Returns the status of the ignore_replacements flag which will either be a 1 or a 0.
ri0
Sets the ignore_replacement flag to false (the default). Returns nothing.
ri1
Sets the ignore_replacement flag to true. Returns nothing.
Anything returned is returned on its own line line. All lists returned have the following format

«num of items»: «item1», «item2», «etc»
(Part of the preceding section was directly copied out of the Ispell manual)


3.1.1.1 Notes of Storing Replacement Pairs

As of version .27 of Aspell storing replacements pairs has a memory. Which means if you first store the replacement pair:

sicolagest -> psycolagest
then store the replacement pair

psycolagest -> psychologist
The replacement pair

sicolagest -> psychologist
will also get stored so that you don't have to worry about it.

3.2 As an independent spell checker

To use Aspell as an independent spell checker type

aspell check «filename»
Where «filename» is the file you want to check. Aspell will over right the original file with the corrected version. The original version is saved as «filename».bak unless it is turned off with the dont-backup option.

If the extension is .tex in will check the file in tex mode unless overridden by the mode option. If the extension is one of the extensions in the sgml-extension option (see section 3.4.4) it will check the file in sgml unless overridden by the mode option.

The exit command saves the file with the corrections made so far. If you want to quite without saving use control-C.


3.3 As an utility to manage word lists

To create the main word list from a list of words use the command

aspell --lang=«lang» create master ./«base» < «wordlist»
where «base» is the name of the word list and «word list» is the list of words separated by white space. The ``./'' is important because without it aspell will create the word list in the normal word list directory.

This will create two files in the current directory. To use the new word list copy the files to the normal word list directory (use ``aspell config'' to find out what it is) and use the option --master=«base».

A personal and replacement word list can be created in a similar fashion.

Because Aspell does not support any sort of affix compression like Ispell does Ispell word lists will not work as is. In order to use Ispell's word lists simply pipe the word list through ``ispell -e'' to expand the munched word lists.

The replacement word has each replacement pair on its own line in the following format

«misspelled word»: «correction»
The dump command will simply dump the contents of a word list to stdout in a formate than can be read back in with aspell create.

If no word list is specified the command will act on the default one. For example the command

aspell dump master
will simply dump the contents of the current master word list to stdout.

3.4 Notes on various filters and filter mode

Aspell now has rudimentary filter support. You can either select from individual filters or chose a filter mode. To select a filter mode use the mode option. You may chose from none, url, email, sgml, and tex. The default mode is url. Individual filters can be added with the option add-filter and remove with the rem-filter option. The currently available filters are url, email, sgml, tex as well as a bunch of filters which translate the text from one format to another.

3.4.1 None Mode

This mode is exactly what it says. It turns off all filters.

3.4.2 Url Filter/Mode

The url filter/mode skips over url's, host names, and email addresses. Because this filter is almost always useful and rarely does any harm it is enabled in all modes except none. To turn it off either select the none mode or use rem-filter option after the desired mode is selected.

3.4.3 Email Filter/Mode

The email filter/mode skips over quoted text. It currently does not support skipping over headers however a future version should. In the mean time I suggest you use Aspell with Newsbody which can be found at http://home.worldonline.dk/~byrial/newsbody/. The option email-skip controls the number of characters that can appear before the email quote char, the default is 10. The option add|rem-email-quote controls the characters that are considered quote characters, the default is ``>' and '|'.


3.4.4 SGML Filter/Mode

The sgml filter/mode will skip over sgml commands. It currently does not handle nested < > unless they are in quotes. It also does it handle the null end tag (net) minimization feature of sgml such as

<emphasis/important/
The option add|rem-sgml-check controls which sgml tags should always be checked. The default is ``alt''.

The option add|rem-sgml-extension controls which file extensions are recognized as sgml/html files. The default is html, htm, php, and sgml. The extension are not case sensitive so extensions like .HTM will also be recognized.

The sgml mode also enables a filter which will recognize sgml charter commands such as &amp; and convert it into the proper iso8859-1 character. Currently only the iso8859-1 character set is used however in future versions it will convert it to the encoding that is specified in the language date file. You can specifically turn on this filter by enable the SGML&«charset»/«charset» filter.

3.4.5 TEX Filter/Mode

The tex (all lowercase) filter/mode skips over TEX commands and parameters and/or options to certain command. It also skips over TEX comments by default. The option [dont-]tex-check-comments controls whether or not aspel will skip over TEX comments. The option add|rem-tex-command controls which TEX commands should have certain parameters and/or options also skipped over. Commands that are not specified will have all there parameters and/or options checked. The format for each item is

«command»  «a list of p,P,o and Os»
The first item is simple the command name. The second item controls which parameters to skip over. A 'p' skips over a parameter while a 'P' won't. Similar an 'o' will skip over an optional parameter while a 'O' won't. The first letter on the list will apply to the first parameter, the second letter will apply to the second parameter etc. If there are more parameters than letters Aspell will simply check them as normal. For example the option

add-tex-command rule pp
will skip over the first two parameters of the ``rule'' command while the option

add-tex-command foo Pop
will check the first parameter of the ``foo'' command, skip over the next optional parameter, if it is present, and will skip over the second parameter -- even if the optional parameter is not present -- and will check any additional parameters.

A'*' at the end of the command is simply ignored. For example the option

enlargethispage p
will ignore the first parameter in both enlargethispage and enlargethispage*.

To remove a command simple use the rem-tex-command option. For example

rem-tex-command foo
will remove the command foo, if present, from the list of TEX commands.


3.5 Notes on the different suggestion modes

In order to understand what these suggestion modes do, a basic understanding of how aspell works is required. See section 6 for that. The suggestion modes are as follows.

fast
This method looks for soundslikes within one edit distance apart. It returns about the same results of aspell .28.3 and earlier but is at least 5 times faster. In this mode Aspell gets 88% of the words from my small test kernel of misspelled words. (Go to http://aspell.sourceforge.net/test for more info on the test kernel as well as comparisons of this version of Aspell with previous versions and other spell checkers.)
normal
This method looks for soundslikes within two edit distance apart. Is it slower than aspell .28.3 and earlier but returns better suggestions. This mode gets 94% of the words.
bad-spellers
This method also looks for soundslikes within two edit distances apart but is more tailored for the bad speller where as fast or normal are more tailed to strike a good balance between typos and true misspellings. This method also returns a huge number of words for the really bad spellers who can't seam to get the spelling anything close to what it should be. If the misspelled word looks anything like the correct spelling it is bound to be found somewhere on the list of 100 or more suggestions. This mode gets 98% of the words.


next up previous contents
Next: 4. Library Interface Up: Aspell .29.1 alpha A Previous: 2. Getting Started   Contents
Kevin Atkinson 2000-02-18