HTML Parser for PHP-4

Note: This project has been inactive for some time, but we recommend checking out Simple HTML DOM Parser, which is a PHP 5 DOM parser based on this project.

Overview

This is an open source HTML parser written in PHP. This parser also comes with a tool that converts HTML to text, as an example.

This parser is designed for speed and flexibility. It does not create an object model for you. But it doesn't prevent you from using its results to create an object model if you need to. It is not based on callbacks (like a SAX parser) but instead you ask for the next element or node in the document as needed.

Requirements

This parser has been tested with PHP 4.0.4. It should work with PHP 4.0.3+.

Download

The latest version is available at SourceForge's download area for this project.

User Tips

You only need to copy src/htmlparser.inc to a location in your codebase where you'd like to be able to include it. The PHP file which uses the parser might look like this:


<?
  include ("htmlparser.inc");

  $htmlText = "... HTML text here ...";
  HtmlParser parser = new HtmlParser ($htmlText);
  while ($parser->parse()) {
     
      // Data you can use here:
      //
      // $parser->iNodeType
      // $parser->iNodeName
      // $parser->iNodeValue
      // $parser->iNodeAttributes     

      if ($parser->iNodeType == NODE_TYPE_ELEMENT) {
          ...
      }
  }
?>

The field named $parser->iNodeType is particularly useful. Its value may be NODE_TYPE_ELEMENT, NODE_TYPE_ENDELEMENT, NODE_TYPE_TEXT, etc. You will find more documentation in the source code for htmlparser.inc.

Feedback

Other Projects


SourceForge.net Logo