Installing¶
As with the other BDC-ODC tools, datacube-stats are being installed through a Docker container. In the structure presented, a dask cluster based on Docker containers is created, which allows the configuration of distributed execution of operations. The use of the distributed operation is optional, and all the execution can be done in parallel, in a single processor.
To begin with, the first step is to acquire the source code used to create the Docker images:
git clone https://github.com/brazil-data-cube/bdc-odc.git
cd bdc-odc/docker/odc-stats/
With the code downloaded and inside the odc-stats directory, it is necessary to create the .datacube.conf file. After creating this file in the current directory, access the dask-cluster directory, and create a .env file. This file will be used to configure the mapping of the container data directory
Note
For more information about the possible values in the .env file, see the datacube-stats configuration file page.
After configuring these files, run the build.sh script. It will create the images and also run the containers:
./build.sh
To visualize the containers in execution, after finishing the script execution, use the command below:
docker-compose ps
The output should look like the following:
Name Command State Ports
------------------------------------------------------------------------------------------------------------------
dask-cluster_odc-stats_1 python3 Up
dask-cluster_scheduler_1 tini -g -- /usr/bin/prepar ... Up 0.0.0.0:8786->8786/tcp, 0.0.0.0:8787->8787/tcp
dask-cluster_worker_1 tini -g -- /usr/bin/prepar ... Up
Make sure everything is correct by running the datacube-stats help command:
docker-compose exec odc-stats datacube-stats --help
The output should look like:
Usage: datacube-stats [OPTIONS] STATS_CONFIG_FILE
Options:
--save-tasks FILE
--load-tasks PATH
--tile-index INTEGER... Override input_region specified in
configuration with a single tile_index
specified as [X] [Y]
--tile-index-file FILE A file consisting of tile indexes specified
as [X] [Y] per line
--output-location TEXT Override output location in configuration
file
--year INTEGER Override time period in configuration file
--task-slice SLICE The subset of tasks to perform, using
Python's slice syntax.
--batch INTEGER The number of batch jobs to launch using PBS
and the serial executor.
--list-statistics
--version
-v, --verbose Use multiple times for more verbosity
--log-file TEXT Specify log file
-E, --env TEXT
-C, --config, --config_file TEXT
--log-queries Print database queries.
--qsub OPTS Launch via qsub, supply comma or new-line
separated list of parameters. Try
--qsub=help.
--workers-per-node INTEGER For code that parallelizes over cores
--queue-size INTEGER Overwrite defaults for queue size
--celery HOST:PORT Use celery backend for parallel computation.
Supply redis server address, or "pbs-launch"
to launch redis server and workers when
running under pbs.
--dask HOST:PORT Use dask.distributed backend for parallel
computation. Supply address of dask
scheduler.
--parallel INTEGER Run locally in parallel
--version
--help Show this message and exit.