Importing HTML Tables into a Calc Document

From NeoWiki

Jump to: navigation, search

The Calc module of NeoOffice allows you to save tables from HTML pages. Several methods are available, and one of them allows you to synchronize your tables with those of the internet site. A variation on these methods offers a solution to the problem of certain xls files opening in Writer instead of Calc.


[edit] Saving without synchronization

With almost all web browsers, you can select the table on the web page and then simply copy/paste it in a spreadsheet.
Sometimes all the table is pasted in only one column. If that is the case, then you can use one of the following methods:

[edit] First method

  • Copy the table Cmd-C.
  • Open a Calc document and in the Edit menu, choose the Paste Special… and one or the other option can give the desired result.

[edit] Second method, usable if the page is publicly accessible

  • Open a Writer Web document.
    • To do that, in the File menu, choose the New sub-menu and the HTML Document item.
  • Copy the URL of the web page which contains the desired table then paste it in the Load URL field located on the top and on the left side of the document toolbar.
  • Press Enter or Return.
  • Select the table you want to insert in a spreadsheet and copy it Cmd-C.
  • Paste it Cmd-V in your Calc document.

[edit] Saving with synchronization

The web page must be publicly accessible.
If you want your table to be synchronized with the web page, you can proceed in the following way:

  • Open the web page which contains the table you want to import.
  • Copy the URL.
  • Open a Calc document and then select a cell.
  • Choose the Link to External Data… sub-menu of the Insert menu.
  • Press Enter or Return.
  • In the dialog that appears, you'll see HTML_tables and HTML_all among several options in the Available tables/ranges section.
  • Select the table you want to insert and click OK. If you are not able to find the desired table, select HTML_all. Later you will be able to copy/paste the part of the web page of interest to you.
  • You may specify a synchronization rate by checking the Update every checkbox and setting the synchronizations rate in the provided box.

[edit] Opening auto-generated xls files in Calc

Some websites create auto-generated files that are actually HTML files, but which have an xls extension. Excel will open these files correctly, but if you try to open them with NeoOffice, they will open in Writer, because NeoOffice recognizes that they are HTML files. thanson offered the following solution in the trinity forums.

This procedure assumes that the troublesome spreadsheet has been saved to your local machine - that is, the file that has the embedded HTML forcing it to open in Writer/Web.

  • Open a new blank spreadsheet in Calc.
  • In the Insert menu, choose Link to External Data….
  • The External Data window appears, asking you for the "URL of external data source" - obviously, you don't need the URL as the file is already on your hard drive. So then, click the browse button (...) just to the right of the URL field. Browse to the location of the spreadsheet, select the file, and Calc will attempt to find the data tables in the spreadsheet. Be prepared to wait a minute or two as Calc digs though the file attempting to make sense of what it has found.
  • Once Calc has sifted through the file you'll see a few choices in the Available tables/ranges section - I had several options from which to choose, including HTML_tables, HTML_all, and HTML_1
  • Select the table you want to insert into the spreadsheet and click OK. I simply selected HTML_all and got what I needed, but I suppose that you could select only specific tables if you wanted to exclude some data.

Another option is to open a new, blank spreadsheet and choose Sheet From File… from the Insert menu. Select your file, choose which sheet to import from the file, and the position to insert the imported sheet in the blank spreadsheet, and then click OK.

N.B. NeoOffice's underlying HTML import code assumes these sorts of files are always ISO-8859-1 (or Latin 1)-encoded. If your file is in a different encoding (for instance, UTF-8), you will need to modify the file in a text editor to prevent corruption of non-ASCII characters. At the beginning of the file, add
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
(for a UTF-8-encoded file) in order to allow the HTML import code to know the correct character encoding.

This article in other languages: Français Italiano
Personal tools