|
Exporting the Word List from a User Dictionary
From NeoWiki
Revision as of 22:33, 23 March 2009 (edit) Sardisson (Talk | contribs) (add Markk's improved (standard.dic-parsing) script) ← Previous diff |
Current revision (22:34, 23 March 2009) (edit) (undo) Sardisson (Talk | contribs) (→Almost-Fully-Automated Method - link to Markk's script) |
||
Line 16: | Line 16: | ||
===Almost-Fully-Automated Method=== | ===Almost-Fully-Automated Method=== | ||
- | NeoOffice user Markk has expanded on a script originally [http://trinity.neooffice.org/modules.php?name=Forums&file=viewtopic&t=7127 brought to our attention] by [[User:yoxi|yoxi]] and created an almost fully-automated method of importing your NeoOffice '''standard.dic''' file into your Mac OS X user dictionary. | + | NeoOffice user Markk [https://trinity.neooffice.org/modules.php?name=Forums&file=viewtopic&p=51703#51703 has expanded on] a script originally [http://trinity.neooffice.org/modules.php?name=Forums&file=viewtopic&t=7127 brought to our attention] by [[User:yoxi|yoxi]] and created an almost fully-automated method of importing your NeoOffice '''standard.dic''' file into your Mac OS X user dictionary. |
To use, paste the script below into a new plain text file, save it as '''split_to_dict''', and run it from the command line, e.g. <code>perl split_to_dict inputfile1 > targetfile</code> (where <code>inputfile1</code> is a '''standard.dic''' file or a file with one word per line). Then either paste the contents of the resulting file into your Mac OS X user dictionary file, or use an additional UNIX command: <code>cat targetfile >> ~/Library/Spelling/targetDictionary</code> (where <code>targetDictionary</code> is '''en''' or '''GB_en''' or whatever). | To use, paste the script below into a new plain text file, save it as '''split_to_dict''', and run it from the command line, e.g. <code>perl split_to_dict inputfile1 > targetfile</code> (where <code>inputfile1</code> is a '''standard.dic''' file or a file with one word per line). Then either paste the contents of the resulting file into your Mac OS X user dictionary file, or use an additional UNIX command: <code>cat targetfile >> ~/Library/Spelling/targetDictionary</code> (where <code>targetDictionary</code> is '''en''' or '''GB_en''' or whatever). |
Current revision
You may wish to export the contents of your NeoOffice user dictionary at some point, for instance to sync the contents of this dictionary with your Mac OS X user dictionary.
Contents |
“Exporting†the User Dictionary
When spell-checking a document, you have the option of adding unrecognized words to a user dictionary. The default user dictionary is standard.dic. If you want to export these words, you can do so as follows:
- Locate the dictionary file. It can be found at the following path (where ~ represents your home folder):
- NeoOffice 2.2.x: ~/Library/Preferences/NeoOffice-2.2/user/wordbook/standard.dic
- NeoOffice 3.x: ~/Library/Preferences/NeoOffice-3.0/user/wordbook/standard.dic
- Copy this file to the Desktop or another location
- Edit the name of this file (the copy) so that the extension reads .txt
- Open this .txt file in NeoOffice (if asked which filter to use, chose UTF).
- You will see a list of words separated by # characters. You can use a global search and replace to format the file as you need to.
- You may also need to remove some hard page returns.
Importing Words into the Mac OS X User Dictionary
Almost-Fully-Automated Method
NeoOffice user Markk has expanded on a script originally brought to our attention by and created an almost fully-automated method of importing your NeoOffice standard.dic file into your Mac OS X user dictionary.
To use, paste the script below into a new plain text file, save it as split_to_dict, and run it from the command line, e.g. perl split_to_dict inputfile1 > targetfile
(where inputfile1
is a standard.dic file or a file with one word per line). Then either paste the contents of the resulting file into your Mac OS X user dictionary file, or use an additional UNIX command: cat targetfile >> ~/Library/Spelling/targetDictionary
(where targetDictionary
is en or GB_en or whatever).
After a restart of NeoOffice, all those new terms will be recognised by the spell-checker alongside the ones you've already added.
Example
The following example creates a file called neo-words on the Desktop that contains the contents of your standard.dic dictionary and then imports the words from neo-words into your Mac OS X English user dictionary.
cd ~/Desktop perl split_to_dict ~/Library/Preferences/NeoOffice-3.0/user/wordbook/standard.dic > neo-words cat neo-words >> ~/Library/Spelling/en
split_to_dict
script
#/usr/bin/perl -w use strict; # This script 'split_to_dict' # 1. Reads standard input or a list of files specified on the command line # line by line in text mode, so it will automatically account for # unicode double byte where it (and perl) can. # 2. It splits the lines into strings based on whitespace or # null (zero) characters # 3. It removes all control characters from the strings and # 4. Outputs the strings to STDOUT separated by zero (null) characters. # # Usage: perl split_to_dict inputfile1 inputfile2 > targetfile # # The inputfiles could be standard.dic OSX or Open Office dict or a list # of words one per line or whitespace separated. # # The targetfile is suitable for pasting into ~/Library/Spelling/ dictionaries: # cat targetfile >> ~/Library/Spelling/targetDictionary # # should do it where targetDictionary is "en" or "GB_en" or whatever. # based on ideas from Cameron Hayne (macdev@hayne.net) June 2005 # version 1 Mark Kaehny March 2009 # # released under the same license as the standard perl distribution: # GPL version 2 or later (See the Free Software Foundation Websitei) or # Artistic license version 2. # my $line; my $word; while ($line = <>) { # split on whitespace or NULL (0 valued) character foreach $word (split(/[\s\x00]/, $line)) { next if $word =~ /WBSWG6/; # skip standard.dic header # add manually if needed. $word =~ s/[\cA-\cZ]//g; # junk all control chars (i.e. 1-26 ascii) print $word, "\x00" if ($word); # add null & skip blank words } }
Completely Manual Method
- Follow the steps in “Exporting†the User Dictionary above to create a list of words from the NeoOffice user dictionary.
- Save the list of words (e.g. as a text file)
- Open the file in TextEdit
- Spell-check the file in TextEdit and learn the words.
Additional Methods of Adding Large Word Lists to the Mac OS X User Dictionary
Instead of manually spell-checking the list of words exported from NeoOffice, tools like Dictionary Editor, Dictionary Cleaner, or custom scripts may also be useful for transferring words into the Mac OS X dictionary.
has pointed out a Perl script that will take a text file of one-word-per-line as input and turn it into a format that you can paste straight into your ~/Library/Spelling/<language code>
file.
After a restart, all those new terms will be recognised by the spell-checker alongside the ones you've already added.
To use, paste the script into a new plain text file, save it as dictify, and run it from the command line, e.g. ./dictify input.file > output.file
(where input.file
is the one-word-per-line file).
#!/usr/bin/perl -w # This script reads a list of strings (one per line) from STDIN # or from the files supplied as command-line arguments # and outputs those strings to STDOUT separated by zeros. # Cameron Hayne (macdev@hayne.net) June 2005 # cl format is ./dictify input.file > output.file where input.file has one word per line # paste contents of output.file into ~/Library/Spelling/en_GB - TextWrangler etc. show the invisibles my $zerobyte = pack("B8", 0); while (<>) { chomp(); print "$_$zerobyte"; }