%define module_name Text-NGrammer
# BEGIN SourceDeps(oneline):
BuildRequires: perl(ExtUtils/MakeMaker.pm) perl(Lingua/Sentence.pm) perl(Module/Build.pm)
# END SourceDeps(oneline)
%define _unpackaged_files_terminate_build 1
BuildRequires: rpm-build-perl perl-devel perl-podlators

Name: perl-%module_name
Version: 0.06
Release: alt1
Summary: Pure Perl extraction of n-grams and skip-grams
Group: Development/Perl
License: perl
Url: %CPAN %module_name

Source0: http://mirror.yandex.ru/mirrors/cpan/authors/id/N/NI/NIDS/%{module_name}-%{version}.tar.gz
BuildArch: noarch

%description
The module provides a way to extract both n-grams and skip-grams from a text, a sentence or fro man array of tokens.

A n-gram is defines as an ordered sequence of tokens in a piece or text.  Some frequent n-grams such as 2-grams, are also called bigrams and they represent all the ordered pairs of words in a text.  For instance, the text "a rose is a flower" is composed by 4 bigrams: "a rose", "rose is", "is a", "a flower".

A skip-gram is defined as an ordered sequence of *n* tokens from a text with a predetermined interval *k*.  For instance, the skip-gram with n=2 and k=1 for a piece of text are all the sequences of tokens of length 2 with interval 1 between the tokens.  For instance, the text "a rose is a flower" is composed by 3 skip-grams with n=2 and k=1: "a is", "rose a", "is a", "is flower".  A skip-gram with k=0 is the same of a n-gram of the same size, e.g., a 2-skip-gram with k=0 is the same of a bigram.

A broader, and better, discussion on n-grams and skip-grams can be found at https://en.wikipedia.org/wiki/N-gram.

Behind the scenes, the module uses the the Lingua::Sentence manpage module to tokenize the text in such a way that the n-grams and skip-grams never go over the boundaries of the sentences.  The module provides also ways to extract the n-grams and skip-grams from sentences, i.e., without invoking the Lingua::Sentence manpage, or from an array of tokens if the application wants to make use of a custom tokenization for the text.  The language to be used for the sentencer must be specified in the constructor; if not present, English is used by default.

All the methods return the n-grams and skip-grams as arrays or references to arrays of length *n*, where *n* is the specifies as a parameter of the method.  Sentences, or more in general, pieces of text are not divided in n-grams skip-grams if not long enough to perform the operation.  For instance, asking for all the n-grams of length 4 for the sentence "I am Francesco" returns an empty array of 4-grams because there are are only 3 t...

%prep
%setup -q -n %{module_name}-%{version}

%build
%perl_vendor_build

%install
%perl_vendor_install

%files
%doc Changes
%perl_vendor_privlib/T*

%changelog
