Skip to contents

Parallel download tools for Oracle databases at PIFSC. The package provides efficient multi-core data extraction from fisheries logbook datasets with automatic credential management, year-based partitioning, and progress tracking.

Key Features

  • Secure credential storage using the system keyring
  • Parallel downloads with automatic year-based partitioning
  • Flexible workflows for single tables or batch processing
  • Progress tracking with performance metrics
  • Optimized for large datasets common in fisheries research

Prerequisites

  1. Oracle Instant Client 23.9+ with ODBC component installed
  2. PIFSC network or NOAA VPN connection for database access

See vignette("connections") for detailed setup instructions.

Installation

# install.packages("renv")
renv::install("N-DucharmeBarth-NOAA/pifsc-odbc")

Quick Start

One-time setup

Store your database credentials (only needed once per machine):

Download a single table

# Parallel download with automatic year detection
hdr <- parallel_download(
  table = "LLDS_HDR_20240315HAC",
  year_col = "LANDYR"
)

# Download specific years
hdr_recent <- parallel_download(
  table = "LLDS_HDR_20240315HAC",
  year_col = "LANDYR",
  years = 2015:2023,
  n_cores = 4
)

Download multiple tables

tables <- list(
  list(table = "LLDS_HDR_20240315HAC", year_col = "LANDYR"),
  list(table = "LLDS_DETAIL_20240315HAC", year_col = "HDR_LANDYR")
)

# Download and save to CSV
results <- download_tables(tables, output_dir = "logbook-data")

Simple single-threaded download

For small tables or when parallel connections aren’t needed:

ref <- simple_download(table = "LLDS_SOME_REF_TABLE")

Documentation

How It Works

The parallel download strategy:

  1. Queries the database to find available years
  2. Splits years evenly across worker cores
  3. Each worker opens its own connection, downloads assigned years, and disconnects
  4. Results are combined into a single data.table

This approach limits concurrent connections to the number of cores, making it database-friendly while achieving significant speedups for large tables.

Disclaimer

“The United States Department of Commerce (DOC) GitHub project code is provided on an ‘as is’ basis and the user assumes responsibility for its use. DOC has relinquished control of the information and no longer has responsibility to protect the integrity, confidentiality, or availability of the information. Any claims against the Department of Commerce stemming from the use of its GitHub project will be governed by all applicable Federal law. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by the Department of Commerce. The Department of Commerce seal and logo, or the seal and logo of a DOC bureau, shall not be used in any manner to imply endorsement of any commercial product or activity by DOC or the United States Government.”


NOAA Fisheries

U.S. Department of Commerce | National Oceanographic and Atmospheric Administration | NOAA Fisheries