|
Exporting the Word List from a User Dictionary
From NeoWiki
Revision as of 17:58, 18 March 2009 (edit) Sardisson (Talk | contribs) (→Importing Words into the Mac OS X User Dictionary - add script from yoxi) ← Previous diff |
Current revision (22:34, 23 March 2009) (edit) (undo) Sardisson (Talk | contribs) (→Almost-Fully-Automated Method - link to Markk's script) |
||
(One intermediate revision not shown.) | |||
Line 14: | Line 14: | ||
==Importing Words into the Mac OS X User Dictionary== | ==Importing Words into the Mac OS X User Dictionary== | ||
- | # Follow the steps above to create a list of words from the NeoOffice user dictionary. | + | ===Almost-Fully-Automated Method=== |
+ | |||
+ | NeoOffice user Markk [https://trinity.neooffice.org/modules.php?name=Forums&file=viewtopic&p=51703#51703 has expanded on] a script originally [http://trinity.neooffice.org/modules.php?name=Forums&file=viewtopic&t=7127 brought to our attention] by [[User:yoxi|yoxi]] and created an almost fully-automated method of importing your NeoOffice '''standard.dic''' file into your Mac OS X user dictionary. | ||
+ | |||
+ | To use, paste the script below into a new plain text file, save it as '''split_to_dict''', and run it from the command line, e.g. <code>perl split_to_dict inputfile1 > targetfile</code> (where <code>inputfile1</code> is a '''standard.dic''' file or a file with one word per line). Then either paste the contents of the resulting file into your Mac OS X user dictionary file, or use an additional UNIX command: <code>cat targetfile >> ~/Library/Spelling/targetDictionary</code> (where <code>targetDictionary</code> is '''en''' or '''GB_en''' or whatever). | ||
+ | |||
+ | After a restart of NeoOffice, all those new terms will be recognised by the spell-checker alongside the ones you've already added. | ||
+ | |||
+ | ====Example==== | ||
+ | |||
+ | The following example creates a file called '''neo-words''' on the Desktop that contains the contents of your '''standard.dic''' dictionary and then imports the words from '''neo-words''' into your Mac OS X English user dictionary. | ||
+ | |||
+ | <pre>cd ~/Desktop | ||
+ | perl split_to_dict ~/Library/Preferences/NeoOffice-3.0/user/wordbook/standard.dic > neo-words | ||
+ | cat neo-words >> ~/Library/Spelling/en</pre> | ||
+ | |||
+ | ====<code>split_to_dict</code> script==== | ||
+ | |||
+ | <pre>#/usr/bin/perl -w | ||
+ | |||
+ | use strict; | ||
+ | |||
+ | # This script 'split_to_dict' | ||
+ | # 1. Reads standard input or a list of files specified on the command line | ||
+ | # line by line in text mode, so it will automatically account for | ||
+ | # unicode double byte where it (and perl) can. | ||
+ | # 2. It splits the lines into strings based on whitespace or | ||
+ | # null (zero) characters | ||
+ | # 3. It removes all control characters from the strings and | ||
+ | # 4. Outputs the strings to STDOUT separated by zero (null) characters. | ||
+ | # | ||
+ | # Usage: perl split_to_dict inputfile1 inputfile2 > targetfile | ||
+ | # | ||
+ | # The inputfiles could be standard.dic OSX or Open Office dict or a list | ||
+ | # of words one per line or whitespace separated. | ||
+ | # | ||
+ | # The targetfile is suitable for pasting into ~/Library/Spelling/ dictionaries: | ||
+ | # cat targetfile >> ~/Library/Spelling/targetDictionary | ||
+ | # | ||
+ | # should do it where targetDictionary is "en" or "GB_en" or whatever. | ||
+ | # based on ideas from Cameron Hayne (macdev@hayne.net) June 2005 | ||
+ | # version 1 Mark Kaehny March 2009 | ||
+ | # | ||
+ | # released under the same license as the standard perl distribution: | ||
+ | # GPL version 2 or later (See the Free Software Foundation Websitei) or | ||
+ | # Artistic license version 2. | ||
+ | # | ||
+ | |||
+ | my $line; | ||
+ | my $word; | ||
+ | |||
+ | while ($line = <>) { | ||
+ | # split on whitespace or NULL (0 valued) character | ||
+ | foreach $word (split(/[\s\x00]/, $line)) { | ||
+ | next if $word =~ /WBSWG6/; # skip standard.dic header | ||
+ | # add manually if needed. | ||
+ | $word =~ s/[\cA-\cZ]//g; # junk all control chars (i.e. 1-26 ascii) | ||
+ | print $word, "\x00" if ($word); # add null & skip blank words | ||
+ | } | ||
+ | } | ||
+ | </pre> | ||
+ | |||
+ | ===Completely Manual Method=== | ||
+ | |||
+ | # Follow the steps in [[#“Exporting†the User Dictionary|“Exporting†the User Dictionary]] above to create a list of words from the NeoOffice user dictionary. | ||
# Save the list of words (e.g. as a text file) | # Save the list of words (e.g. as a text file) | ||
# Open the file in TextEdit | # Open the file in TextEdit | ||
# Spell-check the file in TextEdit and learn the words. | # Spell-check the file in TextEdit and learn the words. | ||
- | === | + | ===Additional Methods of Adding Large Word Lists to the Mac OS X User Dictionary=== |
Instead of manually spell-checking the list of words exported from NeoOffice, tools like [http://www.pariahware.com/dictionaryeditor.php Dictionary Editor], [http://www.twoamsoftware.com/?q=dc/about Dictionary Cleaner], or custom scripts may also be useful for transferring words into the Mac OS X dictionary. | Instead of manually spell-checking the list of words exported from NeoOffice, tools like [http://www.pariahware.com/dictionaryeditor.php Dictionary Editor], [http://www.twoamsoftware.com/?q=dc/about Dictionary Cleaner], or custom scripts may also be useful for transferring words into the Mac OS X dictionary. |
Current revision
You may wish to export the contents of your NeoOffice user dictionary at some point, for instance to sync the contents of this dictionary with your Mac OS X user dictionary.
Contents |
“Exporting†the User Dictionary
When spell-checking a document, you have the option of adding unrecognized words to a user dictionary. The default user dictionary is standard.dic. If you want to export these words, you can do so as follows:
- Locate the dictionary file. It can be found at the following path (where ~ represents your home folder):
- NeoOffice 2.2.x: ~/Library/Preferences/NeoOffice-2.2/user/wordbook/standard.dic
- NeoOffice 3.x: ~/Library/Preferences/NeoOffice-3.0/user/wordbook/standard.dic
- Copy this file to the Desktop or another location
- Edit the name of this file (the copy) so that the extension reads .txt
- Open this .txt file in NeoOffice (if asked which filter to use, chose UTF).
- You will see a list of words separated by # characters. You can use a global search and replace to format the file as you need to.
- You may also need to remove some hard page returns.
Importing Words into the Mac OS X User Dictionary
Almost-Fully-Automated Method
NeoOffice user Markk has expanded on a script originally brought to our attention by and created an almost fully-automated method of importing your NeoOffice standard.dic file into your Mac OS X user dictionary.
To use, paste the script below into a new plain text file, save it as split_to_dict, and run it from the command line, e.g. perl split_to_dict inputfile1 > targetfile
(where inputfile1
is a standard.dic file or a file with one word per line). Then either paste the contents of the resulting file into your Mac OS X user dictionary file, or use an additional UNIX command: cat targetfile >> ~/Library/Spelling/targetDictionary
(where targetDictionary
is en or GB_en or whatever).
After a restart of NeoOffice, all those new terms will be recognised by the spell-checker alongside the ones you've already added.
Example
The following example creates a file called neo-words on the Desktop that contains the contents of your standard.dic dictionary and then imports the words from neo-words into your Mac OS X English user dictionary.
cd ~/Desktop perl split_to_dict ~/Library/Preferences/NeoOffice-3.0/user/wordbook/standard.dic > neo-words cat neo-words >> ~/Library/Spelling/en
split_to_dict
script
#/usr/bin/perl -w use strict; # This script 'split_to_dict' # 1. Reads standard input or a list of files specified on the command line # line by line in text mode, so it will automatically account for # unicode double byte where it (and perl) can. # 2. It splits the lines into strings based on whitespace or # null (zero) characters # 3. It removes all control characters from the strings and # 4. Outputs the strings to STDOUT separated by zero (null) characters. # # Usage: perl split_to_dict inputfile1 inputfile2 > targetfile # # The inputfiles could be standard.dic OSX or Open Office dict or a list # of words one per line or whitespace separated. # # The targetfile is suitable for pasting into ~/Library/Spelling/ dictionaries: # cat targetfile >> ~/Library/Spelling/targetDictionary # # should do it where targetDictionary is "en" or "GB_en" or whatever. # based on ideas from Cameron Hayne (macdev@hayne.net) June 2005 # version 1 Mark Kaehny March 2009 # # released under the same license as the standard perl distribution: # GPL version 2 or later (See the Free Software Foundation Websitei) or # Artistic license version 2. # my $line; my $word; while ($line = <>) { # split on whitespace or NULL (0 valued) character foreach $word (split(/[\s\x00]/, $line)) { next if $word =~ /WBSWG6/; # skip standard.dic header # add manually if needed. $word =~ s/[\cA-\cZ]//g; # junk all control chars (i.e. 1-26 ascii) print $word, "\x00" if ($word); # add null & skip blank words } }
Completely Manual Method
- Follow the steps in “Exporting†the User Dictionary above to create a list of words from the NeoOffice user dictionary.
- Save the list of words (e.g. as a text file)
- Open the file in TextEdit
- Spell-check the file in TextEdit and learn the words.
Additional Methods of Adding Large Word Lists to the Mac OS X User Dictionary
Instead of manually spell-checking the list of words exported from NeoOffice, tools like Dictionary Editor, Dictionary Cleaner, or custom scripts may also be useful for transferring words into the Mac OS X dictionary.
has pointed out a Perl script that will take a text file of one-word-per-line as input and turn it into a format that you can paste straight into your ~/Library/Spelling/<language code>
file.
After a restart, all those new terms will be recognised by the spell-checker alongside the ones you've already added.
To use, paste the script into a new plain text file, save it as dictify, and run it from the command line, e.g. ./dictify input.file > output.file
(where input.file
is the one-word-per-line file).
#!/usr/bin/perl -w # This script reads a list of strings (one per line) from STDIN # or from the files supplied as command-line arguments # and outputs those strings to STDOUT separated by zeros. # Cameron Hayne (macdev@hayne.net) June 2005 # cl format is ./dictify input.file > output.file where input.file has one word per line # paste contents of output.file into ~/Library/Spelling/en_GB - TextWrangler etc. show the invisibles my $zerobyte = pack("B8", 0); while (<>) { chomp(); print "$_$zerobyte"; }