Exporting the Word List from a User Dictionary

From NeoWiki

Revision as of 22:34, 23 March 2009 by Sardisson (Talk | contribs)
(diff) ←Older revision | Current revision (diff) | Newer revision→ (diff)
Jump to: navigation, search

You may wish to export the contents of your NeoOffice user dictionary at some point, for instance to sync the contents of this dictionary with your Mac OS X user dictionary.

Contents

“Exporting” the User Dictionary

When spell-checking a document, you have the option of adding unrecognized words to a user dictionary. The default user dictionary is standard.dic. If you want to export these words, you can do so as follows:

  1. Locate the dictionary file. It can be found at the following path (where ~ represents your home folder):
    • NeoOffice 2.2.x: ~/Library/Preferences/NeoOffice-2.2/user/wordbook/standard.dic
    • NeoOffice 3.x: ~/Library/Preferences/NeoOffice-3.0/user/wordbook/standard.dic
  2. Copy this file to the Desktop or another location
  3. Edit the name of this file (the copy) so that the extension reads .txt
  4. Open this .txt file in NeoOffice (if asked which filter to use, chose UTF).
  5. You will see a list of words separated by # characters. You can use a global search and replace to format the file as you need to.
  6. You may also need to remove some hard page returns.

Importing Words into the Mac OS X User Dictionary

Almost-Fully-Automated Method

NeoOffice user Markk has expanded on a script originally brought to our attention by and created an almost fully-automated method of importing your NeoOffice standard.dic file into your Mac OS X user dictionary.

To use, paste the script below into a new plain text file, save it as split_to_dict, and run it from the command line, e.g. perl split_to_dict inputfile1 > targetfile (where inputfile1 is a standard.dic file or a file with one word per line). Then either paste the contents of the resulting file into your Mac OS X user dictionary file, or use an additional UNIX command: cat targetfile >> ~/Library/Spelling/targetDictionary (where targetDictionary is en or GB_en or whatever).

After a restart of NeoOffice, all those new terms will be recognised by the spell-checker alongside the ones you've already added.

Example

The following example creates a file called neo-words on the Desktop that contains the contents of your standard.dic dictionary and then imports the words from neo-words into your Mac OS X English user dictionary.

cd ~/Desktop
perl split_to_dict ~/Library/Preferences/NeoOffice-3.0/user/wordbook/standard.dic > neo-words
cat neo-words >> ~/Library/Spelling/en

split_to_dict script

#/usr/bin/perl -w

use strict;

# This script 'split_to_dict'
# 1. Reads standard input or a list of files specified on the command line
#    line by line in text mode, so it will automatically account for
#    unicode double byte where it (and perl) can.
# 2. It splits the lines into strings based on whitespace or
#    null (zero) characters
# 3. It removes all control characters from the strings and
# 4. Outputs the strings to STDOUT separated by zero (null) characters.
#
# Usage: perl split_to_dict inputfile1 inputfile2 > targetfile
#
# The inputfiles could be standard.dic OSX or Open Office dict or a list
# of words one per line or whitespace separated.
#
# The targetfile is suitable for pasting into ~/Library/Spelling/ dictionaries:
# cat targetfile >> ~/Library/Spelling/targetDictionary
#
# should do it where targetDictionary is "en" or "GB_en" or whatever.
# based on ideas from Cameron Hayne (macdev@hayne.net) June 2005
# version 1 Mark Kaehny March 2009
#
# released under the same license as the standard perl distribution:
# GPL version 2 or later (See the Free Software Foundation Websitei) or
# Artistic license version 2.
#

my $line;
my $word;

while ($line = <>) {
    # split on whitespace or NULL (0 valued) character
    foreach $word (split(/[\s\x00]/, $line)) {
        next if $word =~ /WBSWG6/; # skip standard.dic header
                                   # add manually if needed.
        $word =~ s/[\cA-\cZ]//g; # junk all control chars (i.e. 1-26 ascii)
        print $word, "\x00" if ($word); # add null & skip blank words
    }
}

Completely Manual Method

  1. Follow the steps in “Exporting” the User Dictionary above to create a list of words from the NeoOffice user dictionary.
  2. Save the list of words (e.g. as a text file)
  3. Open the file in TextEdit
  4. Spell-check the file in TextEdit and learn the words.

Additional Methods of Adding Large Word Lists to the Mac OS X User Dictionary

Instead of manually spell-checking the list of words exported from NeoOffice, tools like Dictionary Editor, Dictionary Cleaner, or custom scripts may also be useful for transferring words into the Mac OS X dictionary.

has pointed out a Perl script that will take a text file of one-word-per-line as input and turn it into a format that you can paste straight into your ~/Library/Spelling/<language code> file.

After a restart, all those new terms will be recognised by the spell-checker alongside the ones you've already added.

To use, paste the script into a new plain text file, save it as dictify, and run it from the command line, e.g. ./dictify input.file > output.file (where input.file is the one-word-per-line file).

#!/usr/bin/perl -w

# This script reads a list of strings (one per line) from STDIN
# or from the files supplied as command-line arguments
# and outputs those strings to STDOUT separated by zeros.
# Cameron Hayne (macdev@hayne.net) June 2005

# cl format is ./dictify input.file > output.file where input.file has one word per line
# paste contents of output.file into ~/Library/Spelling/en_GB - TextWrangler etc. show the invisibles

my $zerobyte = pack("B8", 0);
while (<>)
{
chomp();
print "$_$zerobyte";
}
Personal tools