Define a Stickleback model, used for automated detection of behavioral events in bio-logging data.

Stickleback(tsc, win_size, tol, nth = 1, n_folds = 4, seed = NULL)

Arguments

tsc

[py:sktime.base.BaseEstimator] A time series classifier created with either compose_tsc or create_tsc.

win_size

[integer(1)] Sliding window size in number of observations. E.g., for 10 Hz data and a 5 s sliding window, win_size should be 50.

tol

[numeric(1)] Prediction tolerance, in seconds. See sb_assess for details.

nth

[integer(1)] Sliding window step size. For example, when nth = 1, the time series classifier (tsc) will make predictions on every window. When nth = 2, tsc predictions are only generated for every other window. Higher nth values reduce the time to fit a Stickleback model and generate predictions, at the potential cost of reduced prediction accuracy.

n_folds

[integer(1)] Number of folds for internal cross validation. n_folds must be at least 2. Larger n_folds values increase model fitting time, but may have greater out-of-sample accuracy.

seed

[integer(1)] Random number seed for model reproducibility. CURRENTLY NOT WORKING (see issue #6).

Details

There are two challenges facing automated behavioral event detection in bio-logging data. First, bio-logging data are time series and most classification algorithms have poor performance on time series. Second, bio-logging data resolution greatly exceeds the frequency of many biological rates, creating an imbalanced class problem. For example, bio-logging data collected from baleen whales is often standardized at 10 Hz, but feeding rates are approximately 200-500 events per day. Therefore, the "behavioral event" class is on the order of 1000s times smaller than the "non-event" class.

Stickleback addresses these challenges in a two-stage process. First, it uses classification algorithms specifically designed for time series data by interfacing with the sktime Python package. Second, it under-samples the majority class ("non-events") when training the classifier, then optimizes event prediction using internal cross-validation. See vignette(rstickleback) for more details.

Slots

local_clf

[py:sktime.base.BaseEstimator] A time series classifier, inheriting from sktime's BaseEstimator.

win_size

[integer(1)] Sliding window size.

tol

[numeric(1)] Prediction tolerance, in seconds.

nth

[integer(1)] Sliding window step size.

n_folds

[integer(1)] Number of folds for global cross validation step.

seed

[integer(1)] Random number seed.

.stickleback

[py:Stickleback] Python Stickleback object.

Examples

# Load sample data
c(lunge_sensors, lunge_events) %<-% load_lunges()
# Define a time series classifier
tsc <- compose_tsc(module = "interval_based",
                   algorithm = "SupervisedTimeSeriesForest",
                   params = list(n_estimators = 2L, random_state = 4321L),
                   columns = columns(lunge_sensors))
# Define a Stickleback model
sb <- Stickleback(tsc,
                  win_size = 50,
                  tol = 5,
                  nth = 10,
                  n_folds = 4,
                  seed = 1234)