23 September 2009
Web Asset Tracker (WATr) is a web application which was mainly used to maintain various asset information using a RDBMS database. There rose a specific requirement where the application needs to render the data in a downloadable spreadsheet. The following post documents the comparison of leading industry frameworks for creating MS-Excel Spreadsheets using Java. == Purpose The purpose of this document is to list out the usage and pros/cons of the following Excel Java APIs: 1. Apache POI 2. Java Excel API [JXL] 3. OpenXLS [Product of ExtenXLS]
For comparison of the JS frameworks, the following list of criteria were used
The API should have the ability
to read existing Spreadsheets in MS-Excel 97, 2000, XP, 2003 and 2007 formats
to read existing Spreadsheets in ODS [OpenOffice.org Calc Spreadsheet] formats
provide interoperability between various formats/versions
to edit existing Spreadsheets in the above mentioned formats
to create new spreadsheets in the above mentioned formats
to preserve and create formula and functions on various worksheets
to manipulate multiple worksheets within a workbook
to manipulate, create and edit chart information
to freeze and split panes
to format cells,cell patterns, fonts and borders
for row and column sizing, formatting, auto-sizing, insertion and deletion
for cell validations and named ranges
for row and column grouping and collapsing
to draw shapes using the microsoft office drawing tools
to add cell comments
to define printable formats and printing headers/footers
to add embeddable objects
The API should also possess 1. Matured Documentation, Tutorials and Examples 2. Many adaptations 3. Active development and community support
Apache POI, a project run by the Apache Software Foundation, and previously a sub-project of the Jakarta Project, provides pure Java libraries for reading and writing files in Microsoft Office formats, such as Word, PowerPoint and Excel. The name was originally an acronym for "Poor Obfuscation Implementation", referring humorously to the fact that the file formats seemed to be deliberately obfuscated, but poorly, since they were successfully reverse-engineered. This explanation - and those of the similar names for the various sub-projects - were removed from the official web-pages in order to better market the tools to businesses who would not consider such humour appropriate. The POI project is the master project for developing pure Java ports of file formats based on Microsoft’s OLE 2 Compound Document Format. OLE 2 Compound Document Format is used by Microsoft Office Documents, as well as by programs using MFC property sets to serialize their document objects.
The various components of the Apache POI API are:
POIFS is the set of APIs for reading and writing OLE 2 Compound Document Formats using (only) Java.
HSSF and XSSF are the set of APIs for reading and writing Microsoft Excel 97-2007 and OOXML spreadsheets using (only) Java.
HWPF is the set of APIs for reading and writing Microsoft Word 97(-XP) documents using (only) Java.
HSLF is the set of APIs for reading and writing Microsoft PowerPoint 97(-XP) documents using (only) Java.
HPSF is the set of APIs for reading property sets using (only) Java.
Java Excel API is a mature, open source java API enabling developers to read, write, and modifiy Excel spreadsheets dynamically. Now java developers can read Excel spreadsheets, modify them with a convenient and simple API, and write the changes to any output stream (e.g. disk, HTTP, database, or any socket).
Some of the available features are: 1. Reads data from Excel 95, 97, 2000, XP, and 2003 workbooks 2. Reads and writes formulas (Excel 97 and later only) 3. Generates spreadsheets in Excel 2000 format 4. Supports font, number and date formatting 5. Supports shading, bordering, and coloring of cells 6. Modifies existing worksheets 7. Supports copying of charts 8. Supports insertion and copying of images into spreadsheets 9. Supports logging with Jakarta Commons Logging, log4j, JDK 1.4 Logger, etc
JExcelApi does not generate or chart, graph or macro information. This information is however preserved when spreadsheets are copied
When adding images to a sheet, only PNG image formats are supported
jexcel fails fatally when encountering invalid formulas, so parsing client supplied spreadsheets might be a problem
Poor Documentation for any of the advanced features like validation lists, column and cell formatting
OpenXLS is the open-source version of ExtenXLS (commercial Java SDK)
Some of the available features are:
Compatible with Excel '97-2003 file formats
Control over charts, formulas, and formatting from Java
Based on robust ExtenXLS 6 Codebase
Drop-in upgradability to ExtenXLS supported versions
Good documentation, user guide, and sample code gets you up to speed fast
Insert, size, and position JPG, GIF, and PNG images in your Spreadsheet files
Control over spreadsheet formatting
Preserves Charts, PivotTables
Preservation of VB macros (NOTE: VB runtime execution not supported)
200+ Formula Functions Supported
Create and work with Named Ranges
Supports Merged Cells
Convert Spreadsheets to XML and vice-versa
Features are very less when compared with its commerical counter-part.
Support is not available either through an active community or the organization
Functions mostly like a spring board to use the commerical version
Does not support Excel 2007
Does not support Open Office Spreadsheet format
Where we are trying to fit in this framework
A Java Excel API should * should provide us with well documented and mature API * hould provide us with extension points for providing our own customizations
How it is aligned with our current requirements
A Java Excel API should * Provide us with ability to format Excel sheets * Validation rules for cells/columns * Formula validation and insertion
Is it going to be one-off (or) continued usage?
A Java Excel API should be such that * It can added as a plugin to the framework * It makes the export / import functionality implementation seamless * It has the capability to render images, charts etc.,
The inferences gained from performing this comparison:
JXL can be used for faster rendering, however will fail in terms of huge data sets or failed formulae
OpenXLS acts just as a springboard to its commerical counterpart and has limited functionality
Apache POI has a matured and active community support with rapid releases, good documentation and lots of features
Apache POI also has functionalities for OpenOffice documents which would help in transforming results in spreadsheet to other formats such as PDF, Word or PPT
With these criteria in mind and based on the scope provided, Apache POI was chosen to be integrated within the framework.
Note: This is based upon reading the available documentation, limited user experience and discussion forums