The accelerometers, magnetometers, and other sensors used in modern bio-loggers allow ecologists to remotely observe animal behavior at ever finer scales (Wilmers et al. 2015). However, new computational techniques are needed for processing, visualizing, and analyzing the large amount of data generated by these sensors (Nathan et al. 2022; Williams et al. 2017; Cade et al. 2021). For example, detecting behavioral events in bio-logging sensor data, such as feeding or social interactions, requires sifting through hours of high-resolution data - a laborious and potentially error-prone process. Existing methods for automating behavioral event detection typically rely on signal processing (Sweeney et al. 2019), machine/deep learning (Ngô et al. 2021; Bidder et al. 2020), or a combination of the two (Chakravarty et al. 2020). However, bio-logging data are time series, which are difficult to classify using traditional methods (Keogh and Kasetty 2003). Fortunately, the data mining research community developed new algorithms specifically designed for time series (Bagnall et al. 2017; Ruiz et al. 2021), which they published in a standardized Python package, sktime (Löning et al. 2019).

stickleback, named for the classical animal behavior model organism, is a machine learning pipeline for automating behavioral event detection in bio-logging data. It interfaces with sktime to provide bio-logging scientists access to the latest developments in time series learning. The user interface was designed to solve many of the computational challenges facing bio-logging scientists. For example, interactive visualizations facilitate inspection of high-resolution, multi-variate bio-logging data, and users can define a temporal tolerance for “close enough” predictions. This package, rstickleback, solves another critical problem for bio-logging scientists. Ecology as a field preferentially uses R (Lai et al. 2019), but machine learning tools are most often developed in Python. This package, rstickleback, solves the language-domain mismatch by providing an R interface to Python-based tools.


stickleback is a supervised learning pipeline that operates in three steps. The local step trains a machine learning classifier on a subset of the data to differentiate events from non-events. The global step uses a sliding window and cross validation to identify a prediction confidence threshold for events. Finally, the boosting step uses prediction errors identified during the global step to augment the training data for the local step.

Data structure

stickleback requires two types of data: bio-logging sensor data, \(S\), and labeled behavioral events, \(E\). \(S\) can be raw data, such as tri-axial acceleration, or derived variables such as pitch or overall dynamic body acceleration (Gleiss, Wilson, and Shepard 2010; Wilson et al. 2006). \(E\) must be points in time. Contrast events with segmentation, where behaviors are periods of time, which is usually accomplished through unsupervised methods like hidden Markov models (Langrock et al. 2012). Both \(S\) and \(E\) must be associated with bio-logger deployments, \(d\).

Local step

The goal of the local step is to train a time series classifier to differentiate behavioral events from the background (non-events). From the user’s perspective, the critical inputs are (1) a time series classification model \(M\) and (2) a window size \(w\). \(w\) determines the length of the time series extracted for training \(M\).

stickleback extracts training data, \(D_L\) for \(M\) composed of \(2n\) windows from \(S\), where \(n\) is the number of events in \(E\). The training data includes (1) the windows in \(S\) centered on all \(n\) events in \(E\) (class events) and (2) a non-overlapping random sample of windows in \(S\) (class non-events).

Using a subset of \(S\) for \(D_L\) addresses the imbalanced class issue in behavioral event detection. For high resolution bio-logging data, behavioral events can be outnumbered by non-events by a factor on the order of 100-1000x or more. Therefore, a random sample of \(n\) non-events undersamples the majority class, improving performance on the minority class (Haibo He and Garcia 2009). This can lead to increased false positive rates, however, which is addressed later by the boosting step.

\(M\) must be a time series classification model from the sktime package, which the local step fits to the local training data, \(D_L\).

Global step

The bio-logging sensor data, \(S\), is longitudinal, but the time series classification model, \(M\), is trained on windows of length \(w\), so the global step connects the two time scales. The critical inputs are the temporal tolerance, \(\epsilon\), and the number of folds for cross validation, \(f\).

  1. First, the global step makes predictions using \(M\) with a sliding window to produce a new time series, the local probability of an event, \(p_l\). \(p_l\) is a continuous time series, but recall that behavioral events are represented as points in time. So additional steps are required to extract predicted events from \(p_1\).
  2. Then, the global step extracts all the peaks in \(p_l\) and calculates their prominences \(r\). Prominence is defined as the height of a peak relative to the lowest point between it and a higher peak, which represents how much a peak stands out relative to the nearby topography of the time series.
  3. Finally, the global step assesses the prediction outcome of the \(p_l\) peaks at different prominence thresholds. Predicted events are considered true positives if they are the closest prediction in time to an event in \(E\) and they are within the tolerance, \(\epsilon\). Therefore, the outcome (true or false positive) of a \(p_l\) peak depends on the prominence threshold, \(\hat{r}\). Consider two \(p_l\) peaks and an \(\epsilon\) of 10 s. The first peak has a prominence of 0.75 and is 8 s from the nearest event in \(E\); the second peak has a prominence of 0.5 and is 5 s from the nearest event in \(E\). Both peaks are within \(\epsilon\) of a known event, so if \(\hat{r}\) is less than 0.5, then the second peak is a true positive and the first peak is a false positive. However, if the \(\hat{r}\) is between 0.5 and 0.75, then the second peak is not a predicted event and the first peak is a true positive. The global step chooses \(\hat{r}\) that maximizes the \(F_1\) score of the predicted events.

The global step as described uses the same data to train \(M\) and select \(\hat{r}\), which will probably bias \(\hat{r}\) too high. This is because the \(p_l\) output of \(M\) for out-of-sample data will likely be lower than for in-sample data. Therefore, the global step actually partitions \(S\) and \(E\) into \(f\) folds and uses cross validation to choose \(\hat{r}\). For each fold, a copy of \(M\), \(M'\), is trained on the other \(f-1\) folds of data. Step 1 uses \(M'\) to generate \(p_l\) for the held out fold. The \(p_l\) series for each fold are then merged, and steps 2 and 3 use the combined \(p_l\) for selecting \(\hat{r}\).

Boosting step

Undersampling the majority class (non-event) can lead to increased false positives. These false positives are “near misses”, where the animal’s movement was similar enough to the behavior of interest to fool the time series classifier, \(M\). These windows of time contain important information for differentiating between event and non-event windows, and are valuable for training \(M\), but stickleback cannot know when they are a priori. Therefore, in the boosting step, all windows centered on the false positive predictions are added to \(D_L\), the training data set for \(M\). Then the local and global steps are repeated.

In code

Use Stickleback() to define the model. The argument tsc (Time Series Classifier) corresponds to \(M\). Use compose_tsc() or create_tsc() to define tsc. Arguments win_size, tol, and n_folds correspond to \(w\), \(\epsilon\), and \(f\), respectively. nth modifies how the global step generates \(p_l\). If nth = 2, for example, then \(p_l\) is evaluated for every other window. Gaps are filled with cubic spline interpolation.

sb_fit() runs all three steps in the method: local, global, and boosting.


Bagnall, Anthony, Jason Lines, Aaron Bostrom, James Large, and Eamonn Keogh. 2017. “The Great Time Series Classification Bake Off: A Review and Experimental Evaluation of Recent Algorithmic Advances.” Data Mining and Knowledge Discovery 31 (3): 606–60.
Bidder, Owen R., Agustina di Virgilio, Jennifer S. Hunter, Alex McInturff, Kaitlyn M. Gaynor, Alison M. Smith, Janelle Dorcy, and Frank Rosell. 2020. “Monitoring canid scent marking in space and time using a biologging and machine learning approach.” Scientific Reports 10 (1): 588.
Cade, David E., William T. Gough, Max F. Czapanskiy, James A. Fahlbusch, Shirel R. Kahane-Rapport, Jacob M. J. Linsky, Ross C. Nichols, et al. 2021. “Tools for Integrating Inertial Sensor Data with Video Bio-Loggers, Including Estimation of Animal Orientation, Motion, and Position.” Animal Biotelemetry 9 (1): 34.
Chakravarty, Pritish, Gabriele Cozzi, Hooman Dejnabadi, Pierre-Alexandre Léziart, Marta Manser, Arpat Ozgul, and Kamiar Aminian. 2020. “Seek and Learn: Automated Identification of Microevents in Animal Behaviour Using Envelopes of Acceleration Data and Machine Learning.” Methods in Ecology and Evolution 11 (12): 1639–51.
Gleiss, Adrian C., Rory P. Wilson, and Emily L. C. Shepard. 2010. “Making Overall Dynamic Body Acceleration Work: On the Theory of Acceleration as a Proxy for Energy Expenditure.” Methods in Ecology and Evolution 2 (1): 23–33.
Haibo He, and E.A. Garcia. 2009. “Learning from Imbalanced Data.” IEEE Transactions on Knowledge and Data Engineering 21 (9): 1263–84.
Keogh, Eamonn, and Shruti Kasetty. 2003. Data Mining and Knowledge Discovery 7 (4): 349–71.
Lai, Jiangshan, Christopher J. Lortie, Robert A. Muenchen, Jian Yang, and Keping Ma. 2019. “Evaluating the Popularity of R in Ecology.” Ecosphere 10 (1).
Langrock, Roland, Ruth King, Jason Matthiopoulos, Len Thomas, Daniel Fortin, and Juan M. Morales. 2012. “Flexible and Practical Modeling of Animal Telemetry Data: Hidden Markov Models and Extensions.” Ecology 93 (11): 2336–42.
Löning, Markus, Anthony Bagnall, Sajaysurya Ganesh, Viktor Kazakov, Jason Lines, and Franz J. Király. 2019. “Sktime: A Unified Interface for Machine Learning with Time Series.” arXiv:1909.07872 [Cs, Stat], September.
Nathan, Ran, Christopher T. Monk, Robert Arlinghaus, Timo Adam, Josep Alós, Michael Assaf, Henrik Baktoft, et al. 2022. “Big-Data Approaches Lead to an Increased Understanding of the Ecology of Animal Movement.” Science 375 (6582).
Ngô, Mạnh Cường, Raghavendra Selvan, Outi Tervo, Mads Peter Heide-Jørgensen, and Susanne Ditlevsen. 2021. “Detection of Foraging Behavior from Accelerometer Data Using U-Net Type Convolutional Networks.” Ecological Informatics 62 (May): 101275.
Ruiz, Alejandro Pasos, Michael Flynn, James Large, Matthew Middlehurst, and Anthony Bagnall. 2021. “The Great Multivariate Time Series Classification Bake Off: A Review and Experimental Evaluation of Recent Algorithmic Advances.” Data Mining and Knowledge Discovery 35 (2): 401–49.
Sweeney, David A., Stacy L. DeRuiter, Ye Joo McNamara-Oh, Tiago A. Marques, Patricia Arranz, and John Calambokidis. 2019. “Automated Peak Detection Method for Behavioral Event Identification: Detecting Balaenoptera Musculus and Grampus Griseus Feeding Attempts.” Animal Biotelemetry 7 (1).
Williams, Hannah J., Mark D. Holton, Emily L. C. Shepard, Nicola Largey, Brad Norman, Peter G. Ryan, Olivier Duriez, et al. 2017. “Identification of Animal Movement Patterns Using Tri-Axial Magnetometry.” Movement Ecology 5 (1).
Wilmers, Christopher C., Barry Nickel, Caleb M. Bryce, Justine A. Smith, Rachel E. Wheat, and Veronica Yovovich. 2015. “The Golden Age of Bio-Logging: How Animal-Borne Sensors Are Advancing the Frontiers of Ecology.” Ecology 96 (7): 1741–53.
Wilson, Rory P., Craig R. White, Flavio Quintana, Lewis G. Halsey, Nikolai Liebsch, Graham R. Martin, and Patrick J. Butler. 2006. “Moving Towards Acceleration for Estimates of Activity-Specific Metabolic Rate in Free-Living Animals: The Case of the Cormorant.” Journal of Animal Ecology 75 (5): 1081–90.