# BEGIN SourceDeps(oneline):
BuildRequires: perl(ExtUtils/MakeMaker.pm)
# END SourceDeps(oneline)
%define module_version 0.90
%define module_name HTML-CGIChecker
%define _unpackaged_files_terminate_build 1
BuildRequires: rpm-build-perl perl-devel perl-podlators

Name: perl-%module_name
Version: 0.90
Release: alt1
Summary: A Perl module to detect dangerous HTML code>
Group: Development/Perl
License: perl
Url: %CPAN %module_name

Source0: http://cpan.org.ua/authors/id/T/TR/TRIPIE/%module_name-%module_version.tar.gz
BuildArch: noarch

%description
HTML::CGIChecker is a module for web developers to parse HTML and to detect
HTML code that could break a page in some way.
This module is not a HTML validator, but it allows one to check the HTML
code that users post to a web application, for example to a discussion
board, to prevent them to post a piece of code that would render the rest
of a page it is displayed on unusable.

Using it one also can deny javascripts, images, tables or any other
tags on an individual basis. It also can check for correct quoting
and correct URLs.

The module can autocorrect some common bad users' behaviour, for
example the use of unescaped HTML brackets in a chat room, etc.

It is easy to use and very useful in any CGI application in which
you want its users to be able to use HTML in their posts to some
customizable extent. It is object oriented and designed to be easily
extensible.

This is not a validator, for validation you need an other
solution. This module does not care about correctness of the parsed HTML code
at all. All it does care about is whether the HTML code could break a page.
HTML tags that are not paired correctly or that cannot be rendered at all
can pass this checker. All the names of elements and attributes are not case
sensitive.

The checker object is created by calling new() constructor of HTML::CGIChecker
class.


	$checker = new HTML::CGIChecker (
		mode => 'allow',
		....
	);


Then you can use the checkHTML() instance method to perform a check on
a string using the settings this object has been configured with. 

	($checked_string, $Warnings) = 
		$checker->checkHTML ($string);


new() - the constructor

Creates and returns a new checker object that can be configured with
parameters that are described below. Default configuration allows only
a few harmless inline tags to be used in the HTML code:

    B I A U STRONG BR
    EM CITE VAR ABBR Q DFN CODE SUB SUP SAMP KBD ACRONYM

Other tags except the special PRE tag are not allowed.
Javascripts are by default also not allowed. 

The various parameters are passed in as a list of parameter => value
pairs. List of these parameters together with their default values follows:

	mode => 'allow'
	allowclasses => []
	allowtags => [ qw ( 
        B I A U STRONG BR EM CITE VAR ABBR Q DFN CODE
        SUB SUP SAMP KBD ACRONYM
	) ]	
	denyclasses => [ keys (%%tagclasses) ]
	denytags => [ qw ( FONT ) ]
	jscript => 0
	html => 0
	pre => 1
	img_to_link => 0
	check_http => 1
	debug => 0
	nonpairtags => [ qw (
	    IMG HR BR INPUT META AREA COL BASE LINK PARAM
	) ]
	check_attribs => {}
	err_tag => 'Tag {tag} is not allowed in {element}.'
	err_javascript => 'Javascript is not allowed in {element}.'
	err_quote => 'Missing quote in {element}.'
	err_notclosed => 'Pair tag {tag} was not closed.'
	err_notopened => 'Pair tag {tag} was not opened.'


mode

Two modes are available: allow (default) and deny.

allow: Error is raised if any tag that is not explicitely
allowed is found.

deny: Error is raised if an explicitely denied tag is found,
any other tags are allowed.


allowclasses, allowtags

These parameters apply only in the 'allow' mode.
Here you can specify the tags you allow the user to use.
Allowtags must be a reference to an array of tag names.
Allowclasses must be a refernce to an array of class names.
Tag class (tag group) is a set of tags that can be allowed or denied all
at once by allowing or denying the class. These classes are available:

	base        FRAMESET FRAME HTML BODY HEAD TITLE BASE
	            STYLE SCRIPT META NOSCRIPT NOFRAMES
	externals   APPLET OBJECT LINK IFRAME PARAM
	forms       FORM TEXTAREA SELECT INPUT BUTTON LABEL
	            FIELDSET LEGEND OPTGROUP
	tables      TABLE TR TD TBODY THEAD TFOOT TH COLGROUP
	            COL CAPTION
	lists       UL OL LI DL DT DD
	images      IMG MAP AREA
	heading     H1 H2 H3 H4 H5 H6 H7 H8

By default only the above mentioned harmless inline tags are allowed.
By default no classes are allowed.

denyclasses, denytags

These parameters apply only in the 'deny' mode.
They work similar to the allowclasses and allowtags
parameters. By default all above listed classes plus the FONT tag are
denied. All other tags are by default allowed in this mode.

jscript

This option disables javascript inside HTML elements.
You also must ensure that the SCRIPT tag is not allowed to block
the javascript completely.

	0: javascript is not allowed
	1: javascript is allowed
	Default: 0


html

	0: messages will not be in HTML format nor HTML escaped -
       useful for the command line mode
	1: all warning messages will be in HTML versions and also
       HTML escaped
	Default: 0

pre

	0: users will not be allowed to use the special PRE tag
	1: users will be allowed to use the special PRE tag
	Default: 1

img_to_link

	0: do not alter images
	1: convert all images to appropriate links to these
       images: <IMG SRC="url">  ---->  <A HREF="url">url</A>
	Default: 0

check_http

	0: do not alter URLs
	1: prepend "http://" to URLs that do not start
	   with "http://", "ftp://" or "mailto:"
	Default: 1
	
	Note: the URLs are recognized only in
    HREF and SRC parameters.

debug

	0: debugging to STDERR is disabled
	1: debugging to STDERR is enabled
	Default: 0	

nonpairtags

The tags that are processed as non-pair can be specified here
via a reference to an anonymous array.
By default these tags are processed as non-pair:

    IMG HR BR INPUT META AREA COL BASE LINK PARAM

check_attribs

You also can use the check_attribs parameter to allow the user to use
only a limited set of attributes in an element. The parameter is a
hash reference, that consists of key->value pairs, in which the key is
name of an element, and the value is a reference to an array of attributes.
For each element specified in this hash, the user will only be allowed
to use the specified attributes.

For example, if you define following hash reference:

	check_attribs => {
			img => [ 'src', 'width', 'height', 'alt' ]
		}

then the user will be allowed to use ONLY the specified attributes in
the <IMG> element. Any other elements are not affected and the user
will be allowed to use any attributes in them. Names of the elements
and of the attributes are not case sensitive.

Warning messages can be redefined by setting these parameters:

	err_tag          = 'Tag {tag} is not allowed in {element}.'
	err_javascript   = 'Javascript is not allowed in {element}.'
	err_quote        = 'Missing quote in {element}.'
	err_notclosed    = 'Pair tag {tag} was not closed.'
	err_notopened    = 'Pair tag {tag} was not opened.'

Messages displayed above are the defaults. Special tokens {tag} and {element}
are replaced by the appropriate values. You can redefine these messages to
localize them.


checkHTML() - the actual HTML check method


	($checked_string, $Warnings) = 
		$checker->checkHTML ($string);


This method accepts only one parameter - the actual string to check.

If the string contains anything dangerous or not allowed then this method
returns an undefined value and a reference to an array of warning messages
that describe the problems that were detected.

If the string is safe then checked and escaped version of the
string is returned together with an reference to an empty array.

Please note the warning messages are not returned as an array, but as a
reference to an array, that must be dereferenced when you use it as an
array. Usual way to print all the warnings is using the join() function:

	print join ("<BR>\n", @{$Warnings});



%prep
%setup -n %module_name-%module_version

%build
%perl_vendor_build

%install
%perl_vendor_install

%files
%doc Changes README
%perl_vendor_privlib/H*

%changelog
