# BEGIN SourceDeps(oneline):
BuildRequires: perl(ExtUtils/MakeMaker.pm) perl(IO/Handle.pm) perl(Test/More.pm)
# END SourceDeps(oneline)
%define module_version 0.07
%define module_name Text-GuessEncoding
%define _unpackaged_files_terminate_build 1
BuildRequires: rpm-build-perl perl-devel perl-podlators

Name: perl-%module_name
Version: 0.07
Release: alt1
Summary: Convert Text from almost any encoding to ASCII or UTF8
Group: Development/Perl
License: perl
Url: %CPAN %module_name

Source0: http://cpan.org.ua/authors/id/J/JN/JNW/%module_name-%module_version.tar.gz
BuildArch: noarch

%description
CAUTION: unfinished code. No objects created.

Text::GuessEncoding gathers statistic about typical and invalid codes in 
both Latin1 and UTF-8. The concept of 'typical' is currently from an 
european point of view.
Based on this statistics, methods to transform to Latin1 and UTF-8 are provided. These methods handle 'broken' input strings with mixed encodings well.

The input string may or may not have its utf8 flag set correctly; the flag
is ignored. The returned string has the utf8 flag always off, and contains
no characters above codepoint 127 (which means it is inside the ASCII
character set).  If called in a list context, `to_ascii()' returns the
mapping table as a second value.  This mapping table is a hash, using all
recognized encodings as keys. (Any well-formed string should only have one
encoding, but one can never be sure.) Value per encoding is an array ref,
listing all the codepoints in the following form:
`[ [ $codepoint, $replacement_bytecount, [ $offset, ... ] ], ... ]'
Offset positions refer to the output string, where byte counts are identical
with character counts.

Example:
  
  my $guess = new Text::GuessEncoding();
  ($ascii, $map) = $guess->to_ascii("J\x{fc}rgen \x{c3}\x{bc}\n");
  # $ascii = 'Juergen ue';
  # $map = { 'utf8' => [252, 2, [8]], 'latin1' => [252, 2, [1]] };

The input string contains both utf8 encoded u-umlaut glyph and a plain latin1 byte u-umlaut.
The output string is never flagged as utf8.

  ($utf8, $map) = $guess->to_utf8("J\x{fc}rgen \x{c3}\x{bc}\n");
  # $utf8 = 'J\N{U+fc}rgen \N{U+fc}';
  # $map = { 'utf8' => [7], 'latin1' => [1] };
  
`to_utf8' returns a simpler mapping table, as the string preserves more inforation. 
Note that the offsets differ from to_ascii(), as no multi-character rewriting takes place.
The output string is always flagged as utf8.

    use Text::GuessEncoding;

    my $asciitext = Text::GuessEncoding::to_ascii($enctext);
    my ($asciitext,$mapping) = Text::GuessEncoding::to_ascii($enctext);

%package scripts
Summary: %module_name scripts
Group: Development/Perl
Requires: %{?epoch:%epoch:}%name = %version-%release

%description scripts
scripts for %module_name


%prep
%setup -n %module_name-%module_version

%build
%perl_vendor_build INSTALLMAN1DIR=%_man1dir

%install
%perl_vendor_install

%files
%doc README Changes
%perl_vendor_privlib/T*

%files scripts
%_bindir/*

%changelog
