%def_without test
# BEGIN SourceDeps(oneline):
BuildRequires: perl(Class/Std.pm) perl(Class/Std/Utils.pm) perl(ExtUtils/MakeMaker.pm) perl(LWP/Simple.pm) perl(Test/More.pm) perl(Time/HiRes.pm) perl(YAML.pm) perl(version.pm)
# END SourceDeps(oneline)
%define module_version 0.56
%define module_name NCBIx-BigFetch
%define _unpackaged_files_terminate_build 1
BuildRequires: rpm-build-perl perl-devel perl-podlators

Name: perl-%module_name
Version: 0.56
Release: alt1
Summary: Robustly retrieve very large NCBI sequence result sets 
Group: Development/Perl
License: perl
Url: %CPAN %module_name

Source0: http://cpan.org.ua/authors/id/R/RO/ROGERHALL/%module_name-%module_version.tar.gz
BuildArch: noarch

%description
NCBIx::BigFetch is useful for downloading very large result sets of sequences 
from NCBI given a text query. Its first use had over 
11,000,000 sequences as the result of a single keyword search. It uses YAML 
to create a configuration file to maintain project state in case network or 
server issues interrupts execution, in which case it may be easily restarted 
after the last batch. 

Downloaded data is organized by "project id" and "base directory" 
and saved in text files. Each file includes the project id in 
its name. The project_id and base_dir keys are the only required 
keys, although you will get the same search for "apoptosis" 
everytime unless you also set the "query" key. In any case, once 
a project is started, it only needs the two parameters to be 
reloaded.

Besides the data files, two other files are saved: 
1) the initial search result, which includes the WebEnv key, and 
2) a configuration file, which saves the parsed data and is used 
to pick-up the download and recover missing batches or sequences. 

Results are retrived in batches depending on the "return_max" key. 
By default, the "index" starts at 1 and downloads continue until 
the index exceedes "count".

Occasionally errors happen and entire batches are not downloaded. 
In this case, the "index" is added to the "missing" list. This 
list is saved in the configuration file. The missing batches should 
be downloaded every day, and not saved until the end of the complete 
run.

Working scripts are included in the script directory:

_fetch-all.pp
_fetch-missing.pp
_fetch-unavailable.pp

The recommended workflow is:

_1. Copy the scripts and edit them for a specific project. Use 
_   a new number as the project ID. 

_2. Begin downloading by running fetch-all.pp, which will first 
_   submit a query and save the resulting WebEnv key in a project 
_   specific configuration file (using YAML).

_3. The next morning, kill the fetch-all.pp process and run 
_   fetch-missing.pp until it completes.  

_4. Restart fetch-all.pp.  

If you wish to re-download "not available" sequences, you may run 
fetch-unavailable.pp. However, they will be downloaded at the end of 
fetch-all.pp if it completes normally.

If your query result set is so large that your WebEnv times out, simply 
start a new project with that last index of the previous project, and 
it will pick up the result set from there (with a new WebEnv). (Planned 
upgrade will automagically start another search.)

Warning: You may lose a (very) few sequences if your download extends 
across multiple projects. However, our testing shows that the batches 
generated with the same query within a few days of each other are largely 
identical.

%prep
%setup -n %module_name

%build
%perl_vendor_build INSTALLMAN1DIR=%_man1dir

%install
%perl_vendor_install

%files
%doc README Changes
%perl_vendor_privlib/N*

%changelog
