Predator clusters

Predator clusters

Dietrich et al. (2021) restricted cluster analysis to taxa observed at 5% or more of stations, which was necessary to avoid giving rare species too much leverage. Predators are far less diverse, so this analysis retains all species except uncertain IDs (e.g. UNSE = unknown seal). 22 species included in analysis.

Code
total_stations <- n_distinct(zoop_sf$amlr.station)
pred_frac <- predators_clust %>%
  as_tibble() %>% 
  group_by(amlr.station, species) %>% 
  summarize(count_station = sum(count),
            .groups = "drop") %>% 
  group_by(species) %>%
  summarize(total_count = sum(count_station),
            stations_present = n(),
            station_frac = stations_present / total_stations) %>%
  arrange(desc(station_frac))
  
frac_fn <- scales::label_percent(accuracy = 0.1)
pred_frac %>% 
  set_names(c("Species", "Ind", "Stations", "Fraction")) %>% 
  mutate(Fraction = frac_fn(Fraction)) %>% 
  knitr::kable(align = "lrr") %>% 
  kableExtra::kable_styling("striped",
                            full_width = FALSE)
Species Ind Stations Fraction
SNPT 12314 233 59.1%
ANPT 2351 209 53.0%
SGPT 620 177 44.9%
FUSE 5443 152 38.6%
KEGU 679 127 32.2%
ANFU 1089 108 27.4%
CAPT 544 84 21.3%
ADPN 8532 68 17.3%
CRSE 3112 58 14.7%
BLPT 655 53 13.5%
LESE 195 49 12.4%
ANTE 157 37 9.4%
PFSB 31 25 6.3%
MIWH 43 17 4.3%
WESE 19 16 4.1%
GEPN 142 12 3.0%
ELSE 17 11 2.8%
KIWH 48 5 1.3%
EMPN 4 2 0.5%
SBWH 5 2 0.5%
ANSH 89 1 0.3%
ROSE 1 1 0.3%

Optimal number of clusters

We use the gap statistic (Tibshirani, Walther, and Hastie 2001) to choose the optimal number of clusters (\(k\)). The gap statistic (\(f\)) is a goodness of clustering measure. The authors recommended choosing \(k\) at the shoulder of the \(f \sim k\) curve. Heuristically, the shoulder is the smallest \(k\) such that \(f(k) \geq f(k+1) - s(k+1)\) where \(s\) is the standard error of \(f\). According to this rule, the predators are best described by three clusters.

Code
ktibs <- 3
tibs_thr <- with(sightings_gap, gap[ktibs + 1] - SE.sim[ktibs + 1])
ggplot(sightings_gap, aes(x = k, y = gap)) +
  geom_line() +
  geom_pointrange(aes(ymin = gap - SE.sim, ymax = gap + SE.sim)) +
  geom_pointrange(aes(ymin = gap - SE.sim, ymax = gap + SE.sim),
                  filter(sightings_gap, k == 3),
                  color = "red") +
  geom_segment(x = ktibs, xend = ktibs + 1, 
               y = tibs_thr, yend = tibs_thr,
               color = "blue",
               linetype = "dashed") +
  labs(x = expression("Clusters" ~~ (italic(k))),
       y = expression("Gap statistic" ~~ (italic(f)))) +
  scale_color_manual(values = c(`TRUE` = "blue",
                                `FALSE` = "black")) +
  theme_classic() +
  theme(legend.position = "none")

Hierarchical clustering of stations by predator community. Note: 394 stations were included in the zooplankton clustering, but only 245 stations were used for predator clustering.

Code
g <- split(names(sightings_cut), sightings_cut)
p <- ggtree(sightings_clust, hang = -1)
pred_clust_mrca <- sapply(g, function(n) MRCA(p, n))

p %>% 
  groupClade(pred_clust_mrca, group_name = "Predator cluster") + 
  aes(color = `Predator cluster`) +
  layout_dendrogram() +
  theme_dendrogram()

Indicator species

Which species best represent each cluster? Using Dufrene-Legendre indicator species analysis. Indicator values, \(d\), presented for each cluster for \(d\gt0.25\). \(d\) is the product of relative frequency (fraction of sites present within cluster) and relative abundance (fraction of abundance found within cluster).

Code
indval_mtx <- sightings_indval$indval
indval_fmt <- apply(indval_mtx, 2, function(x) {
  ord <- order(x, decreasing = TRUE)
  species <- code_to_common(rownames(indval_mtx))
  result <- ifelse(x >= 0.25, 
         str_glue("{species} ({round(x, 2)})"),
         NA) %>% 
    `[`(ord) %>% 
    na.omit() %>% 
    paste(collapse = "<br>")
  ifelse(result == "", "None", result)
})
as_tibble(t(indval_fmt)) %>% 
  knitr::kable(format = "html", escape = FALSE) %>% 
  kableExtra::kable_styling()
1 2 3
Adélie penguin (0.37)
Crabeater seal (0.3)
Southern fulmar (0.42)
Antarctic petrel (0.41)
Snow petrel (0.34)
Cape petrel (0.28)
Southern giant petrel (0.25)
Antarctic fur seal (0.84)
Snow petrel (0.48)
Kelp gull (0.37)
Southern giant petrel (0.36)
Antarctic tern (0.32)
Antarctic petrel (0.29)

Cluster interpretation

  • Cluster 1 = pack ice
  • Cluster 2 = open water
  • Cluster 3 = marginal ice
Code
predators_clust %>% 
  select(amlr.station, species, count) %>% 
  complete(species, amlr.station, fill = list(count = 0)) %>% 
  left_join(stations_clust %>% 
              as_tibble() %>% 
              select(amlr.station, pred_clust),
            by = "amlr.station") %>% 
  rbind(mutate(., pred_clust = "All clusters")) %>% 
  left_join(station_effort %>% 
              as_tibble() %>% 
              transmute(amlr.station, survey_km = nmi_to_km(survey_nmi)),
            by = "amlr.station") %>% 
  group_by(species, pred_clust) %>% 
  summarize(density_km = sum(count) / sum(survey_km),
            freq = sum(count > 0) / n(),
            density_freq = str_glue("{sprintf('%0.3f', density_km)} [{scales::percent(freq, accuracy = 0.1)}]"),
            .groups = "drop") %>% 
  select(-density_km, -freq) %>% 
  pivot_wider(names_from = pred_clust, values_from = density_freq) %>%
  arrange(desc(`All clusters`)) %>% 
  knitr::kable()
species Open water Marginal ice Pack ice All clusters
SNPT 1.554 [95.7%] 3.116 [96.3%] 0.463 [92.5%] 1.665 [95.1%]
ADPN 0.004 [6.5%] 4.036 [42.6%] 0.948 [67.9%] 1.153 [27.8%]
FUSE 0.073 [50.7%] 2.892 [100.0%] 0.121 [52.8%] 0.736 [62.0%]
CRSE 0.005 [6.5%] 0.437 [38.9%] 1.380 [52.8%] 0.421 [23.7%]
ANPT 0.404 [92.8%] 0.258 [85.2%] 0.177 [66.0%] 0.318 [85.3%]
ANFU 0.240 [59.4%] 0.033 [29.6%] 0.045 [18.9%] 0.147 [44.1%]
KEGU 0.056 [44.2%] 0.158 [74.1%] 0.108 [49.1%] 0.092 [51.8%]
BLPT 0.125 [32.6%] 0.090 [13.0%] 0.001 [1.9%] 0.089 [21.6%]
SGPT 0.084 [73.2%] 0.101 [88.9%] 0.066 [52.8%] 0.084 [72.2%]
CAPT 0.084 [47.8%] 0.116 [20.4%] 0.006 [13.2%] 0.074 [34.3%]
LESE 0.004 [8.7%] 0.080 [38.9%] 0.025 [30.2%] 0.026 [20.0%]
ANTE 0.012 [9.4%] 0.061 [38.9%] 0.003 [5.7%] 0.021 [15.1%]
GEPN 0.005 [2.2%] 0.054 [5.6%] 0.017 [11.3%] 0.019 [4.9%]
ANSH 0.022 [0.7%] 0.000 [0.0%] 0.000 [0.0%] 0.012 [0.4%]
MIWH 0.002 [3.6%] 0.018 [13.0%] 0.003 [9.4%] 0.006 [6.9%]
KIWH 0.000 [0.0%] 0.023 [5.6%] 0.005 [3.8%] 0.006 [2.0%]
PFSB 0.002 [5.8%] 0.006 [18.5%] 0.006 [13.2%] 0.004 [10.2%]
WESE 0.002 [5.1%] 0.001 [1.9%] 0.006 [15.1%] 0.003 [6.5%]
ELSE 0.004 [5.8%] 0.000 [0.0%] 0.002 [5.7%] 0.002 [4.5%]
EMPN 0.000 [0.7%] 0.000 [0.0%] 0.002 [1.9%] 0.001 [0.8%]
SBWH 0.001 [1.4%] 0.000 [0.0%] 0.000 [0.0%] 0.001 [0.8%]
ROSE 0.000 [0.7%] 0.000 [0.0%] 0.000 [0.0%] 0.000 [0.4%]

Cluster distribution

How frequent are the different clusters and where were they found? The open water community was observed most often, followed by the marginal ice zone then the pack ice communities. The relative frequency of the open water community was greatest in 2012, when survey effort in the Bransfield Strait was most limited, and least in 2016. Generally, the open water community occupied the offshore regions, but in 2014 and 2015 it moved onto the shelf into the triangle between Elephant, Joinville, and George Islands.

The marginal ice zone community occupied the shallower, western part of the Bransfield Strait, moving offshore of the Shetland Islands in 2014 and north of Elephant Island in 2016.

The pack ice community was most commonly observed in the south eastern part of the survey region, near Joinville Island. A particularly coherent group of sites in this community formed in 2013 between Joinville Island and the South Shetlands.

Code
stations_clust %>% 
  mutate(Year = factor(Year)) %>% 
  crosstable::crosstable("Year", by = "pred_clust", total = "col") %>% 
  crosstable::as_flextable()

label

variable

pred_clust

Open water

Marginal ice

Pack ice

Year

2012

14 (73.68%)

3 (15.79%)

2 (10.53%)

2013

25 (53.19%)

11 (23.40%)

11 (23.40%)

2014

39 (57.35%)

14 (20.59%)

15 (22.06%)

2015

39 (65.00%)

8 (13.33%)

13 (21.67%)

2016

21 (41.18%)

18 (35.29%)

12 (23.53%)

Total

138 (56.33%)

54 (22.04%)

53 (21.63%)

Code
map_lim <- st_bbox(stations_clust) %>% 
  project_bbox() %>% 
  expand_bbox(factor = 1.2)

ant_basemap() +
  # Sea ice
  geom_tile(aes(x, y, fill = seaice_conc), 
            filter(seaice_conc_df, between(seaice_conc, 0.65, 1)),
            alpha = 0.8) +
  scale_fill_distiller(lim = c(0.65, 1),
                       palette = "Blues",
                       na.value = "transparent") +
  # Predator clusters
  ggnewscale::new_scale_fill() +
  geom_sf(aes(fill = pred_clust), 
          stations_clust, 
          size = 2, 
          shape = 22) +
  facet_wrap(~ Year) +
  scale_x_continuous(breaks = c(-60, -55)) +
  scale_y_continuous(breaks = c(-63, -62, -61, -60)) +
  scale_fill_brewer(palette = "Dark2") +
  guides(fill = guide_legend(override.aes = list(size = 3))) +
  coord_ant(map_lim) +
  theme(legend.position = "bottom",
        legend.title = element_blank())

Code
library(eks)
cluster_kde <- stations_clust %>% 
  group_by(pred_clust) %>% 
  group_map(\(rows, keys) st_kde(rows)$sf %>% 
              filter(contlabel %in% c(50, 95)) %>% 
              mutate(pred_clust = keys$pred_clust)) %>% 
  reduce(sf:::rbind.sf)

ant_basemap() +
  # Predator clusters
  ggnewscale::new_scale_fill() +
  geom_sf(aes(color = pred_clust,
              linetype = contlabel), 
          cluster_kde, 
          fill = NA,
          linewidth = 1) +
  facet_wrap(~ pred_clust) +
  scale_x_continuous(breaks = c(-60, -55)) +
  scale_y_continuous(breaks = c(-63, -62, -61, -60)) +
  scale_color_brewer(palette = "Dark2") +
  scale_linetype_manual(values = c(2, 1), guide = "none") +
  coord_ant(map_lim) +
  theme(legend.position = "bottom",
        legend.title = element_blank())

NMDS

NMDS diagnostics

How many NMDS axes should we use? Scree plot indicates 3 axes is acceptable.

Code
ggplot(nmds_stress, aes(k, stress)) +
  geom_line() +
  geom_point() +
  geom_hline(yintercept = c(0.2, 0.1, 0.05), linetype = "dotted") +
  scale_x_continuous(breaks = nmds_stress$k) +
  expand_limits(y = 0) +
  theme_classic()

Shepard diagram demonstrates the ordination distance and observed dissimilarity are acceptably correlated.

Code
nmds_spear <- cor(nmds_shepard$diss,
                  nmds_shepard$dist,
                  method = "spearman")
nmds_label <- str_glue(
  "Non-metric fit, R^2 = {round(1 - nmds_sightings$stress^2, 3)}\n",
  "Linear fit, R^2 = {round(nmds_spear^2, 3)}\n",
  "k = {nmds_sightings$ndim}\n",
  "Stress = {round(nmds_sightings$stress, 3)}"
)

ggplot(nmds_shepard, aes(diss, dist)) +
  geom_point(size = 0.75, alpha = 0.1) +
  geom_line(aes(y = mono), color = "red") +
  annotate("text", x = 0.05, y = 2.9, label = nmds_label,
           hjust = 0, vjust = 1) +
  expand_limits(x = 0, y = 0) +
  labs(x = "Observed dissimilarity",
       y = "Ordination distance") +
  theme_classic()

Environmental loading

Only two environmental variables were significantly correlated with the NMDS axes: sampling year and ice coverage, with \(r^2\) values of 0.061 and 0.053, respectively. Compare this to the macrozooplankton analysis, where 8 environmental values were significantly correlated with the NMDS axes and \(r^2\) values were as high as 0.36 (chl a in the upper 100 m) and 0.703 (upper mixed layer salinity).

Code
vector_tbl <- with(nmds_envfit$vectors, 
                   cbind(arrows, 
                         r2 = r, 
                         p = pvals)) %>% 
  as_tibble(rownames = "var") %>% 
  mutate(type = "Vector",
         across(NMDS1:p, \(x) sprintf("%0.3f", x)))

factor_tbl <- with(nmds_envfit$factors, 
                   cbind(centroids, 
                         r2 = r[var.id],
                         p = pvals[var.id])) %>% 
  as_tibble(rownames = "var") %>% 
  mutate(type = "Factor",
         across(NMDS1:p, \(x) sprintf("%0.3f", x)))

rbind(vector_tbl, factor_tbl) %>% 
  relocate(type) %>% 
  knitr::kable()
type var NMDS1 NMDS2 NMDS3 r2 p
Vector zuml_m 0.245 0.958 0.149 0.003 0.917
Vector avg.temp 0.073 0.468 0.881 0.029 0.135
Vector avg.salinity -0.054 0.831 -0.554 0.004 0.841
Vector Integ.chla.100m 0.578 -0.601 0.552 0.011 0.580
Vector Integ.phae.100m -0.526 0.848 0.069 0.011 0.564
Vector ice_coverage 0.325 -0.391 -0.861 0.054 0.013
Factor TOD_2levels_civilD 0.009 0.015 -0.025 0.003 0.683
Factor TOD_2levels_civilN -0.027 -0.050 0.048 0.003 0.683
Factor Year2012 -0.070 -0.093 0.260 0.061 0.002
Factor Year2013 -0.062 -0.306 -0.036 0.061 0.002
Factor Year2014 -0.015 -0.007 0.061 0.061 0.002
Factor Year2015 -0.011 0.258 -0.079 0.061 0.002
Factor Year2016 0.106 0.076 -0.047 0.061 0.002
Factor zoop_clust1 0.024 0.041 -0.020 0.028 0.197
Factor zoop_clust2a 0.072 0.048 -0.003 0.028 0.197
Factor zoop_clust2b -0.065 -0.101 0.117 0.028 0.197
Factor zoop_clust3a -0.133 -0.158 0.123 0.028 0.197
Factor zoop_clust3b -0.169 -0.073 -0.309 0.028 0.197
Factor ice_typeOPEN -0.031 0.081 0.085 0.014 0.497
Factor ice_typeTHIN -0.018 -0.002 -0.095 0.014 0.497
Factor ice_typeFY 0.022 -0.043 0.040 0.014 0.497
Factor ice_typeMY 0.618 0.278 -0.039 0.014 0.497

NMDS plots, color-coded by predator cluster with year and ice coverage loadings.

Code
year_centroids <- scores(nmds_envfit, 
                         display = "factors",
                         choices = 1:3) %>% 
  as_tibble(rownames = "envvar") %>% 
  filter(str_starts(envvar, "Year")) %>% 
  mutate(across(starts_with("NMDS"),
                ~ .x * ordiArrowMul(nmds_envfit)),
         Year = substr(envvar, 5, 8))

nmds_plot <- function(axis1, axis2) {
  ice_ordisurf <- ordisurf(
    nmds_sightings ~ nmds_env$ice_coverage,
    choices = c(axis1, axis2),
    plot = FALSE
  )
  
  x <- paste0("NMDS", axis1)
  y <- paste0("NMDS", axis2)
  
  ice_df <- expand_grid(
    axis1 = ice_ordisurf$grid$x,
    axis2 = ice_ordisurf$grid$y,
  ) %>% 
    cbind(as.numeric(ice_ordisurf$grid$z)) %>% 
    set_names(c(x, y, "ice_coverage")) %>% 
    mutate(ice_coverage = ice_coverage / 10) %>% 
    drop_na(ice_coverage)
  
  ggplot(nmds_df, aes(.data[[x]], .data[[y]])) +
    stat_contour(aes(z = .data$ice_coverage, 
                     color = after_stat(level)),
                 ice_df) +
    scale_color_distiller(
      "Ice coverage",
      palette = "Blues",
      labels = scales::percent,
      limits = c(0.25, 0.75),
      breaks = seq(0.25, 0.75, by = 0.1),
      guide = guide_colorbar(barwidth = unit(2, "in"),
                             direction = "horizontal")
    ) +
    ggnewscale::new_scale_color() +
    geom_point(aes(color = pred_clust), alpha = 0.8) +
    geom_point(data = year_centroids, 
               shape = 18,
               size = 2.5,
               color = "black",
               alpha = 0.85) +
    ggrepel::geom_text_repel(aes(label = Year),
                             year_centroids) +
    scale_color_brewer(
      palette = "Dark2",
      guide = guide_legend(override.aes = list(size = 2),
                           direction = "horizontal",
                           order = 1)
    ) +
    labs(color = "Predator cluster") +
    theme_classic()
}

map2(c(1, 1, 2), c(2, 3, 3), nmds_plot) %>% 
  reduce(`|`) +
  plot_annotation(tag_levels = 'A') +
  plot_layout(guides = "collect") &
    theme(legend.position = "bottom",
          legend.direction = "horizontal",
          plot.tag = element_text(face = "bold"))

Environmental conditions by cluster

Code
kw <- map(c("zuml_m", "avg.temp", "avg.salinity", 
            "Integ.chla.100m", "Integ.phae.100m"),
          \(v) {
            kw <- kruskal.test(stations_clust[[v]], stations_clust$pred_clust)
            data.frame(var = v, p = kw$p.value * 5)
          }) %>% 
  list_rbind()

stations_clust %>% 
  as.data.frame() %>% 
  group_by(pred_clust) %>%
  summarize(
    across(c(zuml_m, avg.temp, avg.salinity, Integ.chla.100m, Integ.phae.100m),
           list(mean = partial(mean, na.rm = TRUE), 
                q1 = \(x) quantile(x, 0.25, na.rm = TRUE), 
                q3 = \(x) quantile(x, 0.75, na.rm = TRUE)),
           .names = "{.col}={.fn}")
  ) %>% 
  pivot_longer(-pred_clust) %>% 
  separate_wider_delim(name, "=", names = c("var", "fn")) %>% 
  pivot_wider(names_from = "fn", values_from = "value") %>% 
  mutate(across(mean:q3, \(x) formatC(signif(x, digits=3), digits = 3, format = "fg", flag = "#"))) %>% 
  transmute(pred_clust, var, value = str_glue("{mean} ({q1} - {q3})")) %>% 
  pivot_wider(names_from = pred_clust, values_from = "value") %>% 
  left_join(kw, by = "var") %>% 
  mutate(var = ifelse(p <= 0.05, paste0(var, " *"), var)) %>% 
  select(-p) %>% 
  knitr::kable()
var Open water Marginal ice Pack ice
zuml_m 99.5 (64.8 - 118.) 120. (73.0 - 139.) 126. (72.0 - 174.)
avg.temp * -1.73 (-1.82 - -1.69) -1.73 (-1.81 - -1.69) -1.80 (-1.84 - -1.76)
avg.salinity * 34.1 (33.9 - 34.2) 34.2 (34.1 - 34.3) 34.3 (34.1 - 34.4)
Integ.chla.100m * 13.4 (8.63 - 16.4) 13.0 (7.98 - 16.9) 9.39 (5.76 - 13.1)
Integ.phae.100m 3.73 (2.24 - 4.96) 3.30 (1.67 - 4.14) 2.96 (1.13 - 3.72)

Clusters by ice coverage

Median ice coverage increased from the open water cluster (median 34.1%) to the marginal ice (56.0%) and pack ice (74.9%) clusters. This supports the labels of “open water” and “marginal ice” for clusters 3 and 2, respectively. Also provides evidence for the “pack ice” cluster.

Code
median_coverge <- stations_clust %>% 
  drop_na(ice_coverage) %>% 
  group_by(pred_clust) %>% 
  summarize(median_ice = median(ice_coverage / 10, na.rm = TRUE))

ggplot(stations_clust, 
       aes(x = ice_coverage / 10, fill = after_stat(x))) + 
  geom_histogram(bins = 20, color = "grey30") +
  geom_vline(aes(xintercept = median_ice),
             median_coverge,
             color = "red", linetype = "dashed") +
  scale_x_continuous(labels = scales::percent) +
  scale_fill_distiller(palette = "Blues", guide = NULL) +
  facet_grid(rows = vars(pred_clust)) +
  labs(x = "Ice coverage",
       y = "# sites") +
  theme_minimal()
Warning: Removed 9 rows containing non-finite values (`stat_bin()`).

Predator x prey clusters

Predator/prey cluster associations were statistically significant overall (\(\chi = 53.6, p<0.001\)) as were 5 pairwise associations after applying Bonferroni correction. The marginal ice and pack ice predator communities were positively associated with macrozooplankton communities 3b (large krill, including E. superba) and 3a (an extremely diverse assemblage associated with cold, high salinity water), respectively. The open water predator community was positively associated with cluster 2b (Thysanoessa macrura) and negatively associated with 3a and 3b. Macrozooplankton cluster 1 (a ubiquitous cluster indicated by Salpa and Clione) was not significantly associated with any predator cluster. Neither was macrozooplankton cluster 2a (indicated by important prey species such as E. frigida, E. triacantha, myctophid larvae, and Themisto gaudichaudii), but it associated most often with the open water predator cluster.

Code
pred_prey_chisq <- with(stations_clust, 
                        chisq.test(pred_clust, 
                                   `Winter Cluster factor`,
                                   simulate.p.value = TRUE))
pred_prey_chisq

    Pearson's Chi-squared test with simulated p-value (based on 2000
    replicates)

data:  pred_clust and Winter Cluster factor
X-squared = 59.603, df = NA, p-value = 0.0004998
Code
pred_prey_chisq$residuals
              Winter Cluster factor
pred_clust               1          2a          2b          3a          3b
  Open water   -0.46769243  1.23921454  2.20564081 -1.40395281 -2.76050332
  Marginal ice  0.01619784 -0.33671751 -0.94850401 -1.68660037  3.61591343
  Pack ice      0.73832851 -1.65974368 -2.60165856  3.96788546  0.80454045
Code
pred_prey_table <- with(stations_clust, table(pred_clust, 
                                              `Winter Cluster factor`))
chisq.posthoc.test::chisq.posthoc.test(pred_prey_table, 
                                       simulate.p.value = TRUE) %>% 
  pivot_longer(-c(Dimension, Value), 
               names_to = "zoop_clust", 
               values_to = "value") %>% 
  pivot_wider(names_from = "Value") %>% 
  arrange(`p values`) %>% 
  filter(`p values` < 0.05)
# A tibble: 5 × 4
  Dimension    zoop_clust Residuals `p values`
  <chr>        <chr>          <dbl>      <dbl>
1 Pack ice     3a              4.69   0.000041
2 Open water   3b             -4.36   0.000196
3 Marginal ice 3b              4.27   0.000289
4 Open water   2b              3.65   0.00395 
5 Pack ice     2b             -3.21   0.0197  
Code
pred_prey_tbl <- with(stations_clust, 
                      table(pred_clust, `Winter Cluster factor`))
pred_prey_chisq$expected
              Winter Cluster factor
pred_clust            1       2a        2b        3a        3b
  Open water   58.57959 33.79592 22.530612 11.828571 11.265306
  Marginal ice 22.92245 13.22449  8.816327  4.628571  4.408163
  Pack ice     22.49796 12.97959  8.653061  4.542857  4.326531
Code
pred_prey_chisq$observed
              Winter Cluster factor
pred_clust      1 2a 2b 3a 3b
  Open water   55 41 33  7  2
  Marginal ice 23 12  6  1 12
  Pack ice     26  7  1 13  6
Code
expand_grid(`Predator cluster` = levels(stations_clust$pred_clust),
            `Zooplankton cluster` = unique(stations_clust$`Winter Cluster factor`)) %>% 
  mutate(observed = map2_int(`Predator cluster`, 
                             `Zooplankton cluster`,
                             \(i, j) pred_prey_chisq$observed[i, j]),
         expected = map2_dbl(`Predator cluster`, 
                             `Zooplankton cluster`,
                             \(i, j) pred_prey_chisq$expected[i, j])) %>% 
  mutate(label = str_glue("{observed} ({round(expected, 1)})")) %>% 
  pivot_wider(id_cols = `Predator cluster`,
              names_from = `Zooplankton cluster`, 
              values_from = label) %>% 
  knitr::kable()
Predator cluster 1 2a 2b 3a 3b
Open water 55 (58.6) 41 (33.8) 33 (22.5) 7 (11.8) 2 (11.3)
Marginal ice 23 (22.9) 12 (13.2) 6 (8.8) 1 (4.6) 12 (4.4)
Pack ice 26 (22.5) 7 (13) 1 (8.7) 13 (4.5) 6 (4.3)

References

Tibshirani, Robert, Guenther Walther, and Trevor Hastie. 2001. “Estimating the Number of Clusters in a Data Set via the Gap Statistic.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63 (2): 411–23. https://doi.org/10.1111/1467-9868.00293.