Google SoC 2007 - Project Proposal

Apache Forrest is an XML-based web publishing framework that generates one or more output formats from multiple input sources.

This project consists of developing a set of plugins for Apache Forrest in order to provide support for DOAP, FOAF, GEO, RDF Calendar, hResume and hReview.

Why Bother?

To achieve the primary goal of the Web[1]?

Many people now talk about web as a platform where we can read and write information and they call it Web 2.0. In our so called Web 2.0, every resource is identified by a URI[2] and provides a common interface[3] to other resources on the web so one can communicate with the other. In other words, web sites can talk to any other applications on the Web.

By supporting RDF vocabularies and Microformats[4], we'll be giving birth to a more powerful web, where machines will play a more important role in finding relationships between widely available resources.

Project Plan

Here is a project plan, aiming to describe how critical RDF/XML vocabularies and microformats can be integrated into Apache Forrest.

Phase 1: Explore

It's all about setting priorities

There are a number of standards on the web that claim to be important but are they really worth implementing? Such questions should be answered by Apache Forrest community members. Arguments presented should be based on concrete examples on the web.

Approaches

Our approach to integrating the required formats.

Our approache to integration is quite straightforward given the fact that some plugins are already developed in the whiteboard (as of March 2007). It's rather for me to take a deep look at Forrest as part of the Explore phase.

In addition, best practices on how to implement potential features will be discussed within the community.

Planning

Table 1. Phase 1: Schedule

Phase 1: Schedule
When Who What
30 April Forrest Community List of formats that will be supported by Forrest and nature of the support (input/output). [a]
31 April SinDoc A technical document, resulting from discussions among community members that outlines different approaches to implementing features required by each plugin.

[a] The scope of a given plugin will also be specified. (e.g. what features to implement, suggested by the format).

Phase 2: Implementation

Transformation, Integration and User Experience

The main focus of Forrest is perhaps Transformation that is handled by XSL stylesheets.

Visual and User Interface

We need to make sure that pages generated by Forrest are web standards compliant. Data should be presented with special care to meet usability and requirements. CSS stylesheets are essential. Javascript and Ajax can also help in order to have a rich, yet simple user interface.

More important than CSS and other technical hacks, is the fact that each page Forrest generates should be understood by the reader as an independent document within a more complex system which is the site itself.

This way, Forrest users won't have a hard time customizing existing skins for benefiting from simple and common practices in Web design.

Planning

Table 2. Phase 2: Schedule

Phase 2: Schedule
When Who What
9 July SinDoc Develop Baetle and SKOS plugins with rich examples.
14 July SinDoc Add support for microformats like hCard, hResume, rel-tag, rel-license.
31 July SinDoc Add other plugings like GEO, RDF Calendar, hResume, hReview. Enhance DOAP, Baetle, SKOS and FOAF plugins.
10 August SinDoc CSS style and User Interface.
12 August SinDoc Working out a quick guide, intended for developers to read in order to use plugins created.

Phase 3: Testing and Debugging

Forrest community members will be invited to test an early version (alpha) of the plugins created as a result of earlier phases and provide reviews and suggestions. Bugs will be reported and fixed according to their levels of priority. They can also report concerns about the visual design, although not browser-sepecific issues.

Documentation

The rather abstract documentation intended for Forrest developers should be extended to cover more topics and a broader audience. The result will be in form of a structured user guide containing step-by-step howtos, FAQs, examples, etc.

User Interface: Platform-specific Hacks

In the first place, we'll do our best to avoid any platform/browser-specific code (whether CSS or Javascript), however we're likely to be put into making sure that the output produced by Forrest is visually the same, regardless of the supported graphical web browser used by the user.

In addition, we need to make sure that output documents produced by Forrest meet accessibility requirements and that people using text-based web browsers can properly access all the information on the page. In that case, only the presentation is different not the content.

Beta Release

Once all critical bugs are fixed and all requests are taken care of, we prepare for a beta release of the plugins. It is important to freeze the code until the beta version is released.

Planning

Table 3. Phase 3: Schedule

Phase 3: Schedule
When Who What
17 August Forrest Community[a] Degugging
22 August SinDoc Documentation for users (e.g. User Guide, FAQ, HOWTOs).
25 August SinDoc Browser-specific UI optimization. Code Freeze
29 August SinDoc Releasing plugins

[a] SinDoc will be actively fixing bugs, of course.

Phase 5: Maintenance

After these phases are complete, we'll accept feature requests and fix bugs. I intend to maintain this project until 2010, even after the summer of code. There're so many plugins that we can still add to Forrest.

Why Me?

I[9] extended DocBook Website and developed SilkPage[10] in 2003 to better understand web standards and markup languages like DocBook and how to implement them. SilkPage is very similar to Apache Forrest as they both:

  • use Ant for website generation and deployment;

  • rely on XSLT to transform XML documents into HTML or PDF;

  • provide multiple XSL and CSS themes;

  • use an Open Source license;

  • urge to promote open standards like CSS, etc.

SilkPage supports some RDF/XML formats including RSS 1.0 (originally by Norman Walsh), FOAF (partial support), DOAP (full support), URFM (full support). URFM is an RDF/XML vocabulary that I created for describing digital files, releases and packages.

Why Apache?

I've been using software and middleware developed by the Apache Software Foundation since late 90s. It's good to know that the first project idea page I checked was that of Apache simply because I like its culture and I wished to have a good reason to become an Apache committer.

When I first saw the 'forrest-rdf' project idea, I was so excited (hard to explain) because it was something I'd already done and more importantly, it was something I'd chosen to do. Back then, I worked for Dixite[11] and nobody had asked me to use RDF/XML standards in SilkPage but when I explained to my project manager the advantages of using RDF, he kindly accepted and I worked on it with passion.

References

An up-to-date version of this document is available at: http://sina.khakbaz.com/2007/soc/forrest

[1]  http://www.xml.com/pub/a/2000/12/xml2000/timbl.html
[2]  http://www.w3.org/TR/webarch/
[3]  http://en.wikipedia.org/wiki/Representational_State_Transfer
[4]  http://microformats.org/
[5]  http://forrest.apache.org/docs_0_80/changes.html
[6]  [SVN]/trunk/site-author/status.xml
[7]  [SVN]/trunk/whiteboard/plugins/
[9]  http://sina.khakbaz.com/
[10] http://silkpage.markupware.com/
[11] http://www.dixite.com/