%define module_name Data-Validate-Perl
# BEGIN SourceDeps(oneline):
BuildRequires: perl(Data/Dumper.pm) perl(ExtUtils/MakeMaker.pm) perl(FindBin.pm) perl(Parse/Yapp.pm) perl(Test/More.pm)
# END SourceDeps(oneline)
%define _unpackaged_files_terminate_build 1
BuildRequires: rpm-build-perl perl-devel perl-podlators

Name: perl-%module_name
Version: 0.03
Release: alt1
Summary: validates in-memory perl data using a specification
Group: Development/Perl
License: perl
Url: %CPAN %module_name

Source0: http://mirror.yandex.ru/mirrors/cpan/authors/id/D/DO/DONGXU/%{module_name}-%{version}.tar.gz
BuildArch: noarch

%description
In order to understand internal of this module, working knowledge of
parsing, especially Yacc is required. Stop and grab a book on topic if
you are unsure what this is.

A common parsing mechanism applies state machine onto a string, such
as regular expression. This part is easy to follow. In this module a
Yacc state machine is used, but the target is not plain text but a
in-memory data structure - a tree made up by several perl
scalar/array/hash items.

The process to validate a data structure like that is a tree
traversal. The biggest challenge is how to put these 2 things
together.

The best way to figure a solution is, imagine each step to perform a
depth-first iteration on a tree. Each move can be abstracted as a
'token'. This is the key idea behind.

To elaborate, think how to validate a simple perl hash like below:

   my %%hash = (key1 => value1, key2 => value2, key3 => value3);

To iterate the hash key/value pairs, use a cursor to describe the
following states:

   1. initial state: place the cursor onto hash itself;
   2. 1st state: move cursor to key1;
   3. 2nd state: move cursor to value1;
   4. 3rd state: move cursor to key2;
   5. 4th state: move cursor to value2;
   6. 5th state: move cursor to key3;
   7. 6th state: move cursor to value3;

A draft Yacc grammar written as:

   root_of_hash: key1 value1 | key2 value2 | key3 value3

The state machine needs token to decide which sub-rule to walk
into. Looking onto the key1/2/3, the corresponding token can
simply be the value of themselves. That is:

   root_of_hash: 'key1' value1 | 'key2' value2 | 'key3' value3

Note the quotes, they mark key1/2/3 as tokens. Next move to the hash
value. When the cursor points to a value, I do not care about the
actual value, instead I just want to hint the state machine that it is
a value. It requires another token to accept the state. How about a
plain text token - 'TEXT'. Finally the grammar to be:

   root_of_hash: 'key1' 'TEXT' | 'key2' 'TEXT' | 'key3' 'TEXT'

How to apply the generated state machine to the hash validation then?
Each time the parser cannot determine which is next state, it asks the
lexer for a token. The simplest form of a lexer is just a function to
return the corresponding tokens for each state. At this point, you
might be able to guess how it works:

   1. state machine initialized, it wants to move to next state, so it asks lexer;
   2. the lexer holds the hash itself, it calls keys function, returns the first key as token, put the key returned into its memory;
   3. by the time state machine got key1, it moves the cursor onto 'key1', then asks lexer again;
   4. the lexer checks its memory and figures it returned 'key1' just now, time to return its vlaue, as the state machine has no interest on the actual value, it returns 'TEXT';
   5. state machine finished the iteration of key1/value1 pair, asks for another token;
   6. lexer returns 'key2' and keeps it in its own memory;
   7. state machine steps into the sub-rule 'key2' 'TEXT';
   ...

The state loop is fairly straightforward. Parsing isn't that
difficult, huh :-)

To iterate a nested tree full of scalar/array/hash, other tokens are
introduced:

   1. '[' ']' indicates start/end state of array traversal;
   2. '{' '}' indicates start/end state of hash traversal;
   3. to meet special need, certain rule actions are defined to set some state flags, which influence the decision that the lexer returns the value as 'TEXT', or the actual value string itself;

The state maintenance in lexer is made up by a stack, the stack
simulates a depth-first traversal:

   1. when meets array, iterates array items one by one, if any item is another array or hash, push current array onto the stack together with an index marking where we are in this array. Iterates that item recursively;
   2. similar strategy is applied to hash;

The left piece is a DSL to describe the tree structure. By the time
you read here, I am fairly confident you are able to figure it out
yourself by exercising various pieces of this module, below is a small
leaf-note:

   1. gen_yp_rules function handles translation from data structure spec to corresponding Yacc grammar;
   2. bottom section of this module contains the Lexer function and other routines the Parse::Yapp manpage requires to work (browse the module source to read);
   3. the command-line utility `dvp_gen_parser' reads the spec file, calls gen_yp_rules to generate grammar, fits it into a file and calls `yapp' to create the parser module;

Wish you like this little article and enjoy playing with this module.

%package scripts
Summary: %module_name scripts
Group: Development/Perl
Requires: %name = %{?epoch:%epoch:}%version-%release

%description scripts
scripts for %module_name


%prep
%setup -q -n %{module_name}-%{version}

%build
%perl_vendor_build INSTALLMAN1DIR=%_man1dir

%install
%perl_vendor_install

%files
%doc README Changes
%perl_vendor_privlib/D*

%files scripts
%_man1dir/*
%_bindir/*

%changelog
