NeoLight

From NeoWiki

(Difference between revisions)
Jump to: navigation, search
Revision as of 00:42, 2 May 2005 (edit)
Falk (Talk | contribs)
(added hint that one step must be repeated for additional users)
← Previous diff
Revision as of 05:21, 5 May 2005 (edit) (undo)
OPENSTEP (Talk | contribs)
(NeoLight - Update with unicode info and new MD5sums)
Next diff →
Line 1: Line 1:
=NeoLight= =NeoLight=
Edward Peterlin Edward Peterlin
-4/29/05+ 
 +Authored: 4/29/05
 + 
 +Revised: 5/4/05
==What is NeoLight?== ==What is NeoLight?==
Line 29: Line 32:
http://trinity.neooffice.org/downloads/neolight_installer.pkg.tgz http://trinity.neooffice.org/downloads/neolight_installer.pkg.tgz
-*43K in size+*27K in size
-* MD5 sum : 57770ec8a0a8ee9ba6e88bb211b93632+* MD5 sum : 3b7f47fc47dae869f1f9beacd072d762
-* Last updated 4/29/05 20:46 PST+* Last updated 5/4/05 22:16 PST
A reference Info.plist is also provided: A reference Info.plist is also provided:
Line 220: Line 223:
* The UTI types are required to be defined before files are manually indexed using mdimport -d3. Files that get forcefully indexed prior to the UTI types being defined will get the wrong UTI type in the Spotlight metadata database. Although the type tree gets updated to reflect the proper mapping to the org.neooffice types, the raw type remains fixed to the dynamically generated dyn UTI type and the file cannot be indexed. Workaround is to copy the file to a new file and re-index again. * The UTI types are required to be defined before files are manually indexed using mdimport -d3. Files that get forcefully indexed prior to the UTI types being defined will get the wrong UTI type in the Spotlight metadata database. Although the type tree gets updated to reflect the proper mapping to the org.neooffice types, the raw type remains fixed to the dynamically generated dyn UTI type and the file cannot be indexed. Workaround is to copy the file to a new file and re-index again.
* I have been unable to verify if the text content provided in the kMDItemTextContent key is in fact getting indexed. This key does not show up in the list for mdimport -d3 or -d4. I am unsure if this is a limitation of the build of Tiger on which NeoLight was developed and tested (WWDC 2004). * I have been unable to verify if the text content provided in the kMDItemTextContent key is in fact getting indexed. This key does not show up in the list for mdimport -d3 or -d4. I am unsure if this is a limitation of the build of Tiger on which NeoLight was developed and tested (WWDC 2004).
-* Content is being extracted in UTF-8 encoding, where possible. I am unsure if OOo 1.x files can go Unicode and if I need to handle that differently.+** 'ed 5/4/05' I changed the text encoding to Unicode and the text key is now showing up properly and text should be indexed.
 +* Content is being extracted in Unicode16 encoding, where possible. I am unsure if OOo 1.x files can go to different encodings that may cause the parsers to fail.
* Special characters, like ä,ö,ü,ß aren't indexed correctly * Special characters, like ä,ö,ü,ß aren't indexed correctly
 +** ''ed 5/4/05'' I changed the text encoding to Unicode...can you recheck if these characters now can be indexed? They seemed OK in my text content key but I didn't check document info and the like.
* Spaces in the path to the files to index are problematic (e. g. /Volumes/Geheime Dokumente/test.sxw) * Spaces in the path to the files to index are problematic (e. g. /Volumes/Geheime Dokumente/test.sxw)
 +** ''ed 5/4/05'' Please retest in the 1.0.2 binary; I changed the "popen" command to use double instead of single quotes and was able to access files in directories with spaces.
==Getting the Source Code== ==Getting the Source Code==
Line 259: Line 265:
29.04.2005 ''change install location to /Library/Spotlight'' 29.04.2005 ''change install location to /Library/Spotlight''
 +
 +04.05.2005 ''change from UTF-8 to Unicode; change popen command to use double quotes instead of single''

Revision as of 05:21, 5 May 2005

Contents

NeoLight

Edward Peterlin

Authored: 4/29/05

Revised: 5/4/05

What is NeoLight?

NeoLight is an abbreviation I've come up with for "NeoOffice Spotlight Importer". This importer allows Spotlight to index metadata and content within the files created by NeoOffice. Through Spotlight integration, searching across multiple NeoOffice documents can be done in a nice, Mac-like fashion.

Today is the first development release of the NeoLight plugin. This release aims to be the start of an addition that will be included with future builds of NeoOffice/J. The purposes of this release are:

  1. Allow testers who have Tiger to begin assessing the utility of the plugin.
  2. Allow developers access to the source code to bugfix and improve the plugin.
  3. Research what other types of metadata may wish to be extracted from documents.
  4. Get people excited enough to help add support for OpenDocument formatted documents.

Licensing

NeoLight is being released under the GNU Lesser General Public License. While I would have preferred to use a GPL-style license like the main NeoOffice project, LGPL is necessary in this case as the Spotlight importers are shared libraries loaded into closed-source applications. Using LGPL allows for the executable code of NeoLight to be used by Spotlight without "infecting" Mac OS X with GPL requirements.

Installation Requirements

  • Mac OS X 10.4 "Tiger"
  • NeoOffice/J (or OpenOffice.org X11...see below)
  • patience

Downloading

The NeoLight plugin can be downloaded as a single installer package:

http://trinity.neooffice.org/downloads/neolight_installer.pkg.tgz

  • 27K in size
  • MD5 sum : 3b7f47fc47dae869f1f9beacd072d762
  • Last updated 5/4/05 22:16 PST

A reference Info.plist is also provided:

http://trinity.neooffice.org/downloads/neolight_neoj_Info.plist.gz

  • < 1K in size
  • MD5 sum : 0fc545f35af027424d7fc8f6ea41eaa0

Installing

To install the NeoLight plugin, you are required to edit the Info.plist of NeoOffice/J. These changes, required for Spotlight support, will be integrated into future patch and/or final releases of NeoOffice/J once these plist additions have been verified to be safe on systems older then 10.4.

You should edit the Info.plist before attempting to index documents with Spotlight.

Editing the Info.plist

Throughout its system, Spotlight identifies files by UTI types. Because Mac OS X does not have built-in types for OpenOffice.org formatted files, they must be defined by an installed application to avoid having Spotlight assign them dynamically-generated UTI types (which cannot reliably be mapped to a Spotlight importer).

NeoLight assumes the following UTI types correspond to the following file types:

  • org.neooffice.writer - Writer "sxw" files
  • org.neooffice.calc - Calc "sxc" files
  • org.neooffice.impress - Impress "sxi" files
  • org.neooffice.draw - Draw "sxd" files

To define these types, the NeoOffice/J Info.plist must have a "UTExportedTypeDeclarations" dictionary added to its root. The neolight_neoj_Info.plist.gz file link above includes a 1.1 Release Candidate Info.plist file with these changes already applied. To install it, copy that Info.plist over the appropriate file within the NeoOfficeJ application bundle.

After you either copy over the Info.plist or do the edits manually, you should rebuild your LaunchServices database to make sure the UTI types take by entering this command from shell:

/System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/LaunchServices.framework/Versions/A/Support/lsregister -kill -r -domain local -domain system -domain user


Note:You may have to repeat the above command for every user which does exist at this moment in time and who may ever create NeoOffice documents. It is not necessary to grant those users adminstrator priveleges. Just repeat the command when logged in as this user.


It is also possible to perform these edits manually using the Property List Editor (or your favorite text editor). The contents of the appropriate dictionary to add are:

 <key>UTExportedTypeDeclarations</key>
 	<array>
 		<dict>
 			<key>UTTypeConformsTo</key>
 			<array>
 				<string>public.content</string>
 				<string>public.data</string>
 			</array>
 			<key>UTTypeDescription</key>
 			<string>NeoOffice/J Draw</string>
 			<key>UTTypeIdentifier</key>
 			<string>org.neooffice.draw</string>
 			<key>UTTypeReferenceURL</key>
 			<string>http://xml.openoffice.org/xml_specification.pdf</string>
 			<key>UTTypeTagSpecification</key>
 			<dict>
 				<key>public.filename-extension</key>
 				<array>
 					<string>sxd</string>
 				</array>
 				<key>public.mime-type</key>
 				<string>application/vnd.sun.xml.draw</string>
 			</dict>
 		</dict>
 		<dict>
 			<key>UTTypeConformsTo</key>
 			<array>
 				<string>public.content</string>
 				<string>public.data</string>
 			</array>
 			<key>UTTypeDescription</key>
 			<string>NeoOffice/J Impress</string>
 			<key>UTTypeIdentifier</key>
 			<string>org.neooffice.impress</string>
 			<key>UTTypeReferenceURL</key>
 			<string>http://xml.openoffice.org/xml_specification.pdf</string>
 			<key>UTTypeTagSpecification</key>
 			<dict>
 				<key>public.filename-extension</key>
 				<array>
 					<string>sxi</string>
 				</array>
 				<key>public.mime-type</key>
 				<string>application/vnd.sun.xml.impress</string>
 			</dict>
 		</dict>
 		<dict>
 			<key>UTTypeConformsTo</key>
 			<array>
 				<string>public.content</string>
 				<string>public.data</string>
 			</array>
 			<key>UTTypeDescription</key>
 			<string>NeoOffice/J Calc</string>
 			<key>UTTypeIdentifier</key>
 			<string>org.neooffice.calc</string>
 			<key>UTTypeReferenceURL</key>
 			<string>http://xml.openoffice.org/xml_specification.pdf</string>
 			<key>UTTypeTagSpecification</key>
 			<dict>
 				<key>public.filename-extension</key>
 				<array>
 					<string>sxc</string>
 				</array>
 				<key>public.mime-type</key>
 				<string>application/vnd.sun.xml.calc</string>
 			</dict>
 		</dict>
 		<dict>
 			<key>UTTypeConformsTo</key>
 			<array>
 				<string>public.content</string>
 				<string>public.data</string>
 			</array>
 			<key>UTTypeDescription</key>
 			<string>NeoOffice/J Writer</string>
 			<key>UTTypeIdentifier</key>
 			<string>org.neooffice.writer</string>
 			<key>UTTypeReferenceURL</key>
 			<string>http://xml.openoffice.org/xml_specification.pdf</string>
 			<key>UTTypeTagSpecification</key>
 			<dict>
 				<key>public.filename-extension</key>
 				<array>
 					<string>sxw</string>
 				</array>
 				<key>public.mime-type</key>
 				<string>application/vnd.sun.xml.writer</string>
 			</dict>
 		</dict>
 	</array>

Installing the NeoLight Plugin

To install the NeoLight Importer Plugin, simply extract and double-click the neolight_installer.pkg.tgz file linked to above. After you click through the requisite license agreements, the installer will install the NeoLight plugin named neolight.mdimporter within the /Library/Spotlight directory. This will make the NeoLight importer available for all users on the machine.

Testing Your Installation

If you have the Tiger Developer's Tools installed, you can verify proper loading of the plugin from a Terminal after installation using the following command:

/usr/bin/mdimport -L

You should see /Library/Spotlight/neolight.mdimporter in the list if all has been installed well.

To verify that installation was successful:

  1. Launch NeoOffice/J
  2. Create a new empty Writer document.
  3. Under the File menu item, choose Properties.
  4. Add a Title for the document with a relatively unused nonsense word (e.g. "interzone")
  5. Notice that the title from the Properties dialog appears in the document's titlebar.
  6. Save the document to a known location with a filename that does not contain your nonsense word.
  7. Start a new Spotlight search.
  8. Type the nonsense word you used above.

If the NeoLight plugin is installed properly, the document you just saved should show up in the search results. If it doesn't, open a Terminal and type the followng:

/usr/bin/mdimport -d3 /path/to/test/doc.sxw

Check to make sure the document is of type "org.neooffice.writer". If it is, look in the keys for the "Title" key and you should see your nonsense word. If you do, then everything is installed correctly.

If when you type the above the doc is of type "dyn.a3f42morehexgarbageinherefoo", LaunchServices hasn't mapped the extension to the UTI type properly. Either the update to the Info.plist is misapplied or you may need to rebuild your LaunchServices database.

If you want to index several files or directories, you can use the "find"-command. This example let Spotlight index all *sxw-files in your $HOME-directory:

sudo find -s $HOME -name *sxw -exec /usr/bin/mdimport -d3 {} \;

What does NeoLight Import?

NeoLight currently handles all four types of major OOo/NeoOffice documents in a single plugin. It will extract the following:

  • standard OOo metadata (generally accessible and editable through File > Properties)
    • title
    • author/last edited
    • keywords
    • description
    • comments
  • text content from Writer documents for indexing
  • textual display content of all cells of a Calc document for indexing
  • content of bullets, titles, and other text areas for indexing from Impress and Draw documents.

Once the NeoLight plugin is installed, it should be possible to search OOo formatted documents by these criteria within Spotlight enabled applications.

Using NeoLight to Index OpenOffice.org Documents

NeoLight is not specific to NeoOffice and can index any OpenOffice.org 1.x formatted document. To enable indexing of OpenOffice.org documents if you do not use NeoOffice/J, some application will need to have its Info.plist edited to provide the UTI types listed above. If you use OpenOffice.org Mac OS X (X11), you may want to consider assigning this to the Start OpenOffice.org application Info.plist file (or the Info.plist of whatever launcher you are using).

Known Issues

  • This is completely untested and unreviewed! Prior to the release of Mac OS X 10.4, it was not possible to release source code publicly due to Apple NDA requirements.
  • The UTI types are required to be defined before files are manually indexed using mdimport -d3. Files that get forcefully indexed prior to the UTI types being defined will get the wrong UTI type in the Spotlight metadata database. Although the type tree gets updated to reflect the proper mapping to the org.neooffice types, the raw type remains fixed to the dynamically generated dyn UTI type and the file cannot be indexed. Workaround is to copy the file to a new file and re-index again.
  • I have been unable to verify if the text content provided in the kMDItemTextContent key is in fact getting indexed. This key does not show up in the list for mdimport -d3 or -d4. I am unsure if this is a limitation of the build of Tiger on which NeoLight was developed and tested (WWDC 2004).
    • 'ed 5/4/05' I changed the text encoding to Unicode and the text key is now showing up properly and text should be indexed.
  • Content is being extracted in Unicode16 encoding, where possible. I am unsure if OOo 1.x files can go to different encodings that may cause the parsers to fail.
  • Special characters, like ä,ö,ü,ß aren't indexed correctly
    • ed 5/4/05 I changed the text encoding to Unicode...can you recheck if these characters now can be indexed? They seemed OK in my text content key but I didn't check document info and the like.
  • Spaces in the path to the files to index are problematic (e. g. /Volumes/Geheime Dokumente/test.sxw)
    • ed 5/4/05 Please retest in the 1.0.2 binary; I changed the "popen" command to use double instead of single quotes and was able to access files in directories with spaces.

Getting the Source Code

The source code is in the NeoOffice CVS repository. To checkout the source code do the following:

csh
setenv CVSROOT :pserver:anoncvs@anoncvs.neooffice.org:/cvs
cvs login

use anoncvs as the cvs password

cvs co neolight

This will checkout the neolight module that has the source code for the plugin. Simply open the neolight.xcode project in Xcode and away you go!

Source Code Structure

In general, the source code is split up into the following files:

  • common - These contain utility code used across all the file types and common metadata extraction (all the OpenOffice.org file types have the same meta.xml format)
  • writer - Contains functions for SXW file parsing.
  • calc - Contains functions for SXC file parsing.
  • impress - Contains functions for SXI file parsing (and SXD)
  • main - CFPlugin foundational code and dispatch of metadata extraction to appropriate handler based on UTI type

All of the functions are commented with Doc++/JavaDoc style comments, so you should be able to run a documentation generator on the code to browse through it.

While all of the files are technically Objective-C++, no Cocoa is used in the plugin. It is all CoreFoundation based and I'm essentially using the language as a "better C". Since plugins are supposed to be lightweight, it simply made more sense to re-use the CoreFoundation utilities then to make any grand scheme.

Issues and Feedback

Please note any issues here or in the NeoLight Development forum on http://trinity.neooffice.org

Binary Changelog

29.04.2005 change install location to /Library/Spotlight

04.05.2005 change from UTF-8 to Unicode; change popen command to use double quotes instead of single

Personal tools