1 Introduction

The Atlantic herring research track working group (WG) is interested in exploring impacts of food availability on herring stock dynamics. Indices of zooplankton abundance were proposed for exploration.

These indices could potentially explain:

  1. herring recruitment
  2. herring growth/condition
  3. patterns in herring “survival” random effects

The WG decided to focus on patterns in recruitment, as low recruitment in recent years is an important issue for the stock and for fishery management.

The WG used a boosted regression tree analysis (Molina 2024) to identify zooplankton indices that best explained patterns in herring recruitment. These indices included large copepods in spring (influencing growth of herring postlarvae and juveniles), and small copeopods in fall (influencing survival of herring larvae over the winter).

In this working paper, we develop indices of large copepods in spring and small copepods in fall using zooplankton survey data (citation) from 1982-2022. We tailor the indices to the spatial footprint of the Atlantic herring stock assessment, and further use information on the changing spatial distribution of herring larvae over time from the same dataset to evaluate potential changes in availability of small copepods to herring larvae.

2 Methods

We used spatio-temporal modeling to develop zooplankton indices using the ECOMON data as an input.

Decisions:

  1. zooplankton species
  2. which zooplankton metric (area or volume)
  3. spatial footprint
  4. temporal scale (seasonal? annual?)
  5. potential covariates

2.1 Zooplankton species

We use herring diet information to determine which zooplankton species to include. This information is from both literature and food habits data.

“The herring is a plankton feeder…. Examination of 1,500 stomachs showed that adult herring near Eastport were living solely on copepods and on pelagic euphausiid shrimps (Meganyctiphanes norwegica), fish less than 4 inches long depending on the former alone, while the larger herring were eating both. When first hatched, and before the disappearance of the yolk sac, the larvae (European) feed on larval snails and crustaceans, on diatoms, and on peridinians, but they soon begin taking copepods, and depend exclusively on these for a time after they get to be 12 mm. long, especially on the little Pseudocalanus elongatus. As they grow older they feed more and more on the larger copepods and amphipods, pelagic shrimps, and decapod crustacean larvae (Collette and Klein-Macphee 2002).”

What herring eat from NEFSC food habits data is summarized here: https://fwdp.shinyapps.io/tm2020/ Copepods are highest diet proportion, followed by well digested prey (AR), and euphausiids. Broken down by decade and season, it is clear that copeopods dominate adult diets in winter and spring, and more so recently.

Maybe copeopods are bad for hering??

# Use Brian's new Rmd to get herring diet by decade and season or season and region

herringdiet <- readRDS(here::here("data/herringdiet.rds"))

herringdecseastot <- herringdiet %>%
  # make decade column
  # make 2 seasons winter--> spring and summer--> fall
  dplyr::mutate(decade = floor(year/10)*10) |>
                #season2 = case_when(season %in% c("WINTER", "SPRING") ~ "SPRING",
                #                    season %in% c("SUMMER", "FALL") ~ "FALL")) %>%
                #season2 = case_when(season %in% c( "SPRING") ~ "SPRING",
                #                    season %in% c( "FALL") ~ "FALL")) %>%
  dplyr::select(decade, season, totwt) %>%
  dplyr::distinct() %>%
  dplyr::group_by(decade, season) %>%
  dplyr::mutate(totwt2 = sum(totwt, na.rm = TRUE),
                nyrs = n_distinct(year)) %>%
  dplyr::select(decade, season, totwt2, nyrs) %>%
  dplyr::distinct()

diet <- read.csv(here::here("data/allwtp2024-09-28.csv"))

preylook <- read.csv(here::here("data/SASPREY24.csv"))

preylook <- dplyr::select(preylook, c(Pynam, pycomnam, COLLCAT))

# add blueprey list
decadeseason <- diet %>%
  dplyr::mutate(copeprey = ifelse(prey %in% c("COPEPO"), TRUE, FALSE),
                plotcol = ifelse(copeprey, "blue", "lightgrey"),
                decade = as.integer(gsub("s","",decade))) %>%
  dplyr::left_join(herringdecseastot, by=c("season" = "season", 
                                            "decade" = "decade")) %>%

  dplyr::group_by(decade, season) %>%
  dplyr::filter(meansw>0) %>%
  dplyr::arrange(desc(copeprey), .by_group = TRUE)


bars <- map(unique(decadeseason$decade)
            , ~geom_bar(aes(width=nyrs), stat = "identity", colour = "white"#, position = "stack", 
                       , data = decadeseason %>% filter(decade == .x)))


p2 <-   ggplot(decadeseason, aes(decade, relmsw, fill=copeprey)) +
                                    #fill=factor(prey, 
                                    #            levels = c(as.factor(names(preycolaggsel)),
                                    #                       setdiff(prey, preycolaggsel)),
                                    #           ordered = TRUE
                                    #           ))) +
   #geom_bar(aes(width=nyrs), stat = "identity") + #
   bars +
   ylab("Percent in Diet") +
   xlab("Decade") +
   geom_text(aes(y=relmsw, 
             label = ifelse(relmsw > 3, prey, "")),
             position = position_stack(vjust = 0.5),
             size=2.5)+
   facet_wrap(~fct_relevel(season,"WINTER", "SPRING","SUMMER", "FALL")) +
   theme_bw() +
   #viridis::scale_fill_viridis(discrete = TRUE) +
   scale_fill_manual(values=c( "grey90", "lightblue")) + 
   theme(legend.position = "none"
         #legend.position="bottom",
         #legend.text=element_text(size=5)
         ) #+
    #geom_bar_interactive(stat = "identity", aes(tooltip = prey, data_id = prey))

#ggiraph(code = print(p1)) 
#p1

p2

Our focus in developing the indices is on herring recruitment, so we prioritize indices of food for larvae through juveniles.

Models developed for the Herring RT are for the following copepod categories:

  • Large copepods ALL: Calanus finmarchicus, Metridia lucens, Calanus minor, Eucalanus spp., Calanus spp.

  • Small copepods ALL: Centropages typicus, Pseudocalanus spp., Temora longicornis, Centropages hamatus, Paracalanus parvus, Acartia spp., Clausocalanus arcuicornis, Acartia longiremis, Clausocalanus furcatus, Temora stylifera, Temora spp., Tortanus discaudatus, Paracalanus spp.

These categories included all life stages of each copepod species.

(Models were also developed for Calanus finmarchicus alone, total zooplankton volume, and Euphausiids, which may be useful for comparison with adult Atlantic herring condition or growth in the future.)

An additional model was developed for Atlantic herring larvae using the same dataset in order to characterize the spatial extent of herring larvae compared with their zooplankton prey. This model was not used as an index due to missing data during several years of the time series.

2.2 Zooplankton metric

Previous zooplankton indices use the numbers per 100 cubic meters of filtered water volume _100m3 data (Kane 2007; Morse et al. 2017). Other current analyses use the numbers per 100 cubic meters as well.

2.3 Spatial scale

NEFSC survey strata definitions are built into the VAST northwest-atlantic extrapolation grid already. Therefore, fall and spring survey strata sets used in the Atlantic herring assessment were also applied to the zooplankton indices. Indices were also calculated in Ecological Production Units (EPUs) used in regional ecosystem reporting (Fig. \(\ref{fig:maps}\)).

herring_spring <- c(01010, 01020, 01030, 01040, 01050, 01060, 01070, 01080, 01090, 
                    01100, 01110, 01120, 01130, 01140, 01150, 01160, 01170, 01180, 
                    01190, 01200, 01210, 01220, 01230, 01240, 01250, 01260, 01270, 
                    01280, 01290, 01300, 01360, 01370, 01380, 01390, 01400, 01610, 
                    01620, 01630, 01640, 01650, 01660, 01670, 01680, 01690, 01700, 
                    01710, 01720, 01730, 01740, 01750, 01760)
herring_fall <- c(01050, 01060, 01070, 01080, 01090, 01100, 01110, 01120, 01130, 
                  01140, 01150, 01160, 01170, 01180, 01190, 01200, 01210, 01220, 
                  01230, 01240, 01250, 01260, 01270, 01280, 01290, 01300, 01360, 
                  01370, 01380, 01390, 01400)

MAB <- c(1010:1080, 1100:1120, 1600:1750, 3010:3450, 3470, 3500, 3510)
GB  <- c(1090, 1130:1210, 1230, 1250, 3460, 3480, 3490, 3520:3550)
GOM <- c(1220, 1240, 1260:1290, 1360:1400, 3560:3830)
SS  <- c(1300:1352, 3840:3990)

D70pct_nwast <- readRDS(url("https://github.com/NOAA-EDAB/zooplanktonindex/raw/main/spatialdat/D70pct_nwa_strat2.rds"))

# MAB EPU
MAB2 <- D70pct_nwast %>% 
  dplyr::filter(stratum_number %in% MAB) 

# MAB herring larvae area
MAB2herr <- MAB2 %>%
  dplyr::filter(stratum_number2 %% 10 == 1) 

# MAB outside larval area
MAB2out <- MAB2 %>%
  dplyr::filter(stratum_number2 %% 10 == 2) 

# Georges Bank EPU
GB2 <- D70pct_nwast %>% 
  dplyr::filter(stratum_number %in% GB) 

# GB herring larvae
GB2herr <- GB2 %>%
  dplyr::filter(stratum_number2 %% 10 == 1) 

#GB outside larval area
GB2out <- GB2 %>%
  dplyr::filter(stratum_number2 %% 10 == 2)

# gulf of maine EPU 
GOM2 <- D70pct_nwast %>%
  dplyr::filter(stratum_number %in% GOM)

# GOM herring larvae
GOM2herr <- GOM2 %>%
  dplyr::filter(stratum_number2 %% 10 == 1) 

#GOM outside larval area
GOM2out <- GOM2 %>%
  dplyr::filter(stratum_number2 %% 10 == 2)

# scotian shelf EPU 
SS2 <- D70pct_nwast %>%
  dplyr::filter(stratum_number %in% SS) 

# SS herring larvae
SS2herr <- SS2 %>%
  dplyr::filter(stratum_number2 %% 10 == 1) 

#SS outside larval area
SS2out <- SS2 %>%
  dplyr::filter(stratum_number2 %% 10 == 2)

# whole herring larval area
herrlarv <- dplyr::bind_rows(MAB2herr, GB2herr, GOM2herr, SS2herr)

# outside herring larval area
nolarv <- dplyr::bind_rows(MAB2out, GB2out, GOM2out, SS2out)

# spring herring NEFSC BTS
her_spr2 <- D70pct_nwast %>%
  dplyr::filter(stratum_number %in% herring_spring) 

# fall herring NEFSC BTS
her_fall2 <- D70pct_nwast %>%
  dplyr::filter(stratum_number %in% herring_fall) 

# all epus
allEPU2 <- D70pct_nwast %>%
  dplyr::filter(stratum_number %in% c(MAB, GB, GOM, SS)) 
theme_set(theme_bw())

ggplot(data = ecodata::coast) +
  geom_sf() + 
  geom_point(data = FishStatsUtils::northwest_atlantic_grid, aes(x = Lon, y = Lat),  colour = "coral4", size=0.05, alpha=0.1) +
  geom_point(data = MAB2, aes(x = Lon, y = Lat), colour = "green3", size=0.05, alpha=0.5) +
  geom_point(data = GB2, aes(x = Lon, y = Lat), colour = "orange", size=0.05, alpha=0.5) +
  geom_point(data = GOM2, aes(x = Lon, y = Lat), colour = "red", size=0.05, alpha=0.5) +
  geom_point(data = SS2, aes(x = Lon, y = Lat), colour = "purple", size=0.05, alpha=0.1) +
  #coord_sf(xlim = c(-79, -65.5), ylim = c(33, 45)) + #full extent of VAST model
  coord_sf(xlim =c(-78, -65.5), ylim = c(35, 45)) + #zoomed to Hatteras and N
  ggtitle("MAB (green), GB (orange), GOM (red), SS(purple)")


ggplot(data = ecodata::coast) +
  geom_sf() + 
  geom_point(data = FishStatsUtils::northwest_atlantic_grid, aes(x = Lon, y = Lat),  colour = "coral4", size=0.05, alpha=0.1) +
  geom_point(data = her_spr2, aes(x = Lon, y = Lat), size=0.05, colour = "yellow",  alpha=0.7) +
  #geom_point(data = her_fall2, aes(x = Lon, y = Lat), size=0.03, colour = "lightblue",  alpha=0.3) +
  #coord_sf(xlim = c(-79, -65.5), ylim = c(33, 45)) +
  coord_sf(xlim =c(-78, -65.5), ylim = c(35, 45)) +
  ggtitle("Herring spring (yellow) survey strata")

ggplot(data = ecodata::coast) +
  geom_sf() + 
  geom_point(data = FishStatsUtils::northwest_atlantic_grid, aes(x = Lon, y = Lat),  colour = "coral4", size=0.05, alpha=0.1) +
  #geom_point(data = her_spr2, aes(x = Lon, y = Lat), size=0.05, colour = "yellow",  alpha=0.1) +
  geom_point(data = her_fall2, aes(x = Lon, y = Lat), size=0.03, colour = "lightblue",  alpha=0.7) +
  #coord_sf(xlim = c(-79, -65.5), ylim = c(33, 45)) +
  coord_sf(xlim =c(-78, -65.5), ylim = c(35, 45)) +
  ggtitle("Herring fall (light blue) survey strata")


# ggplot(data = ecodata::coast) +
#   geom_sf() + 
#   geom_point(data = FishStatsUtils::northwest_atlantic_grid, aes(x = Lon, y = Lat),  colour = "coral4", size=0.05, alpha=0.1) +
#   geom_point(data = her_spr2, aes(x = Lon, y = Lat), size=0.05, colour = "yellow",  alpha=0.1) +
#   geom_point(data = herrlarv, aes(x = Lon, y = Lat), size=0.03, colour = "blue",  alpha=0.5) +
#   #coord_sf(xlim = c(-79, -65.5), ylim = c(33, 45)) +
#   coord_sf(xlim =c(-78, -65.5), ylim = c(35, 45)) +
#   ggtitle("Herring larval area Sept-Feb (blue), over spring survey strata (yellow")
Maps of key areas for Herring assessment indices. The full VAST model grid is shown in brown.Maps of key areas for Herring assessment indices. The full VAST model grid is shown in brown.Maps of key areas for Herring assessment indices. The full VAST model grid is shown in brown.

Figure 2.1: Maps of key areas for Herring assessment indices. The full VAST model grid is shown in brown.

2.4 Temporal scale

Options explored for temporal scale included:

  1. An annual model mixes many generations of zooplankton; “all” is months 1-12
  2. Splitting the year into two seasons, rolling winter into spring and summer into fall is done for multiple fish species
    • “SPRING” months 1-6
    • “FALL” months 7-12
  3. Splitting the year into 4 seasons, 2 matching survey seasons (spring/fall) and try two for winter and summer to cover other portions of herring life history. Current designations are
    • “winter” months 12, 1, 2
    • “spring” months 3, 4, 5
    • “summer” months 6, 7, 8
    • “fall” months 9, 10, 11
  4. Finally, timing of herring larvae in the system was derived from the herring larvae data
    • “sepfeb” months 9-12 and 1-2 of the following year
    • “nolarv” months 3-8

Testing models with data split over 4 seasons for 1982-2022 resulted in poor model performance, inability to estimate parameters, and lack of convergence for cases where parameters could be estimated. Therefore, we used the SPRING = January-June and FALL = July-December temporal scale as a starting point.

2.5 Spatio-temporal modeling

We applied the Vector Autoregressive Spatio-Temporal (VAST) model (Thorson and Barnett 2017; Thorson 2019) to estimate zooplankton indices from zooplankton survey data.

VAST is a Geostatistical generalized linear mixed effects model (GLMM) that models two linear predictors for an index: 1. encounter rate, and 2. positive catch (amount in stomach)

A full model for the first linear predictor \(\rho_1\) for each observation \(i\) can include: * fixed intercepts \(\beta_1\) for each category \(c\) and time \(t\), * spatial random effects \(\omega_1\) for each location \(s\) and category, * spatio-temporal random effects \(\varepsilon_1\) for each location, category, and time, * fixed vessel effects \(\eta_1\) by vessel \(v\) and category, and * fixed catchability impacts \(\lambda_1\) of covariates \(Q\) for each observation and variable \(k\):

\[\rho_1(i) = \beta_1(c_i, t_i) + \omega_1^*(s_i, c_i) + \varepsilon_1^*(s_i, c_i, t_i) + \eta_1(v_i, c_i) + \sum_{k=1}^{n_k} \lambda_1(k) Q(i,k)\]

The full model for the second linear predictor \(\rho_2\) has the same structure, estimating \(\beta_2\), \(\omega_2\), \(\varepsilon_2\), \(\eta_2\), and \(\lambda_2\) using the observations, categories, locations, times, and covariates.

VAST model code and documentation is available here: https://github.com/James-Thorson-NOAA/VAST

Spatial and spatio-temporal correlation decay with increasing distance estimated as \(\kappa\) in a Matern function with fixed smoothness and geometric anisotropy (directional correlation, optionally estimated by the model).

Initial model selection consistently supported the inclusion of spatial and spatio-temporal random effects and anisotropy across all datasets: fall, spring, and annual.

Observations in space are used to define fixed locations (“knots”) covering the full extent of the model. We assume constant variation within a timestep at a knot.

Modelers specify the number of knots to group observation locations, to balance computation time and resolution. We used 500 knots (Fig. \(\ref{fig:knots}\)).

knitr::include_graphics(here::here("pyindex/lgcopeALL_spring_500_biascorrect_doy/Data_and_knots.png"))

Observations correlated in space and in space over time due to unmeasured processes are modeled as multivariate normal Gaussian Random Fields (GRF):

\(\omega_1\) ~ MVN(0, \(\mathbf R_1\)); \(\omega_2\) ~ MVN(0, \(\mathbf R_2\))
\(\varepsilon_1\)(,t) ~ MVN(0, \(\mathbf R_1\)); \(\varepsilon_2\)(,t) ~ MVN(0, \(\mathbf R_2\))

Spatial and spatio-temporal correlation decay with increasing distance \(\mathbf d\) estimated as \(\kappa\) in a Matérn function with fixed smoothness \(\nu\) and geometric anisotropy \(H\) (directional correlation).

Correlation function between locations \(s\) and \(s'\):

\[\mathbf R_1(s, s') = \frac{1}{2^{\nu-1}\Gamma(\nu)} \times (\kappa_1 \lvert \mathbf d(s,s') \mathbf H \rvert)^\nu \times K_\nu (\kappa_1 \lvert \mathbf d(s,s') \mathbf H \rvert)\]

Estimation uses stochastic partial differential equation (SPDE) approximation.

Observation model “Index2”, Gamma distribution for positive catches and Alternative “Poisson-link delta-model” using log-link for numbers-density and log-link for biomass per number. It is intended for continuous data, which includes biomass data and “numbers standardized to a fixed area.”

Probability of encounter Poisson: \(p(i) = 1 - exp[-n(i)]\) set to 1 for groups found at all stations, e.g. small copepods or zooplankton volume

Number of zooplankton cells per volume given encounter: \(r(i) = \frac{n(i)}{p(i)}w(i)\)

Probability for numbers per volume \(B\) where \(g\) is a Gamma function. :

\[ Pr[b(i) = B] = \begin{cases} 1-p(i), B = 0\\\\p(i) \times g[B|r(i), \sigma_b^2], B > 0 \end{cases},\]

We are interpreting zooplankton abundance per 100 cubic meters as numbers standardized to a fixed area (volume) in applying the Gamma observation model.

Density \(b\) at a location (knot) \(s\) for year \(t\) is then the predicted number of cells per volume (linear predictor for encounter * linear predictor for cells/vol given encounter):

\[\hat{b}_{s,t} = \hat{n}_{s,t}\hat{w}_{s,t}\]

Index based on area \(a\) weighting for each of 500 knots (or subsets):

\[I_t = \sum_{s=1}^{500} a_s\hat{b}_{s,t}\]

For final models, bias correction was done as in Thorson and Kristensen (2016).

2.6 Structural decisions

Two different observation models were applied, depending on the dataset. The default VAST index standardization (purpose = "index2" in make_settings) uses a Gamma distribution for positive catches and an alternative “Poisson-link delta-model” using log-link for numbers-density and log-link for biomass per number (ObsModel= c(2,1)).

We applied the default observation model to Calanus finmarchicus (calfin_100m3) and Large copepods (calfin_100m3 + mlucens_100m3 + calminor_100m3 + euc_100m3 + calspp_100m3) datasets.

The default was used for index standardization of stomach contents data for pelagic and benthic forage indices. It is intended for continuous data, which includes biomass data and “numbers standardized to a fixed area” (see section starting at line 239 in the VAST user manual here. I am interpreting zooplankton abundance per 100 cubic meters as numbers standardized to a fixed area (volume) in applying the Gamma observation model.

For data where there are some years where the species is present in all (or 0) samples, estimating the probability of encounter fails (or at least, VAST won’t let you try). In these cases, the options are to treat intercepts representing temporal variability as random effects (by setting RhoConfig Beta or Epsilon entries to something other than 0), or to use a different link function.

We intend for our indices to potentially be used in assessment (though as a covariate rather than an index, so maybe I’m being too strict), so the recommendation is to “minimize covariance in the estimated index by excluding any temporal correlation on model components (i.e., the intercept is a fixed effect in each year, and the spatio-temporal term is independent in each year)” (Thorson 2019).

All of the small copepods datasets had at least one year where our small copepods groupings were encountered at all stations. None had years with 0 encounters.

Therefore, we used a different link function, the Poisson-link fixing encounter probability=1 for any year where all samples encounter the species. We kept all other settings for index standardization the same, but set (ObsModel= c(2,4)).

2.7 Model validation

Zooplankton indices have not previously been developed using spatio-temporal modeling in the Northeast US. Therefore, we compare previously estimated annual abundance anomalies for Calanus finmarchicus with annual anomalies estimated from our VAST models. These anomalies are estimated at the ecoregion (Mid Atlantic Bight–MAB, Georges Bank–GB, and Gulf of Maine–GOM) scale.

2.8 Model selection

Model selection was done in two stages, the first looking at spatial, spatio-temporal random effects and the second looking at covariates, following Ng et al. (2021) and Gaichas et al. (2023). The model selection script is available at https://github.com/NOAA-EDAB/zooplanktonindex/blob/main/VASTscripts/VASTunivariate_zoopindex_modselection.R

Models were run using Reduced Maximum Likelihood (REML) and without bias correction for the first stage of model selection.

The only “catchability” covariates tested were Day of Year (“doy”) to compensate for the fast turnover rate of zooplankton during the season.

Converged models with the lowest AIC for each taxonomic/season combination were considered best.

2.9 Model diagnostics

After selecting the converged model structures with the most support based on AIC, standard diagnostics were evaluated for each model: data available by season, year, and in space, quantile residuals in aggregate and in space. Model results include the ln density each year and the resulting indices for each spatial unit.

2.10 Defining herring larval presence in time and space

When developing a zooplankton index, it is important to determine whether the zooplankton are available to herring larvae in time and space. We used data available on herring larvae in the zooplankton dataset to define the most likely time and space for herring larvae, and used those temporal and spatial definitions to further refine selected zooplankton indices.

A herring larvae VAST model was developed using the same methods as above, and resulting indices were compared with those estimated by Richardson et al. (2010).

Models were initially developed using the same seasons as for zooplankton (Spring = January-June, Fall = July-December). However, nearly all observations of herring larvae in the 1982-2022 dataset were from September - February (also reflected in the index developed in Richardson et al. (2010).) Therefore, another model for September-February was developed, which aligned observations in January-February with those from the preceeding fall September-December.

To define areas with the most dense herring larvae, we calculate quantiles of density in space across the time series. First we sum the density in each cells over all years, then calculate the quantiles of summed log density from the September-February herring larvae VAST model.

stratlook <- data.frame(Stratum = c("Stratum_1",
                                    "Stratum_2",
                                    "Stratum_3",
                                    "Stratum_4",
                                    "Stratum_5",
                                    "Stratum_6",
                                    "Stratum_7"),
                        Region  = c("AllEPU",
                                    "her_sp",
                                    "her_fa",
                                    "MAB",
                                    "GB",
                                    "GOM",
                                    "SS"))

stratlook2 <- data.frame(Stratum = c("Stratum_1",
                                    "Stratum_2",
                                    "Stratum_3",
                                    "Stratum_4",
                                    "Stratum_5",
                                    "Stratum_6",
                                    "Stratum_7",
                                    "Stratum_8",
                                    "Stratum_9"),
                        Region  = c("AllEPU",
                                    "her_sp",
                                    "her_fa",
                                    "her_larv",
                                    "no_larv",
                                    "MAB",
                                    "GB",
                                    "GOM",
                                    "SS"))

source(here::here("R/utils_VASToutput.R")) #IEA/HerringRT_2025/

outdir <- here::here("pyindex") #IEA/HerringRT_2025/
moddirs <- list.dirs(outdir) 
moddirs <- moddirs[-1]
# keep folder name
modnames <- list.dirs(outdir, full.names = FALSE)

modselect <- modtable(moddirs)

# lets only look at indices for converged models
modcompare_conv <- modselect |>
  dplyr::ungroup() |>
  dplyr::filter(converged2 == "likely") |>
  dplyr::select(modname) |>
  as.vector() |>
  unname() |>
  unlist()

moddirs_conv <-  moddirs[grepl(sprintf("\\.*(%s)$", paste(modcompare_conv, collapse = '|')), moddirs)]


modcompareindex <- purrr::map_dfr(moddirs_conv, purrr::possibly(getmodindex, otherwise = NULL))

splitoutput <- modcompareindex %>%
  dplyr::mutate(Season = modname |> map(str_split, pattern = "_") |> map_chr(c(1,2))) %>%
  dplyr::mutate(Data = modname |> map(str_split, pattern = "_") |> map_chr(c(1,1))) %>%
  dplyr::mutate(Estimate = ifelse(Estimate == 0, NA, Estimate)) |>
  dplyr::left_join(stratlook)

# only larvarea models have this strata set
splitoutput2 <- modcompareindex %>%
  dplyr::filter(str_detect(modname, "larvarea")) |>
  dplyr::mutate(Season = modname |> map(str_split, pattern = "_") |> map_chr(c(1,2))) %>%
  dplyr::mutate(Data = modname |> map(str_split, pattern = "_") |> map_chr(c(1,1))) %>%
  dplyr::mutate(Estimate = ifelse(Estimate == 0, NA, Estimate)) |>
  dplyr::left_join(stratlook2)

zoomax <- max(splitoutput$Estimate, na.rm=T)

zootsmean <- splitoutput %>%
  dplyr::group_by(modname, Region) %>%
  dplyr::mutate(fmean = mean(Estimate, na.rm=T)) 
# one option, sum D_gct over all years then take quantiles in space

D <- readRDS("Dherr_sepfeb.rds")

Dtot <- D |> 
  dplyr::group_by(cell) |>
  dplyr::mutate(Dsum = sum(D, na.rm=TRUE),
                logD = log(as.vector(Dsum))) |>
  dplyr::select(!c(D, Year)) |>
  dplyr::distinct()

Dvec <- terra::vect(Dtot, geom=c("Lon", "Lat"))

q <- quantile(log(as.vector(Dtot$Dsum)), probs=seq(0,1,0.1))
q2 <- quantile(Dtot$logD, probs=seq(0,1,0.1))

qvec <- terra::quantile(Dvec, probs=c(0.6, 0.65, 0.7, 0.75, 0.8))

quants <- classInt::classIntervals(log(as.vector(Dtot$Dsum)), 
                                   style = "quantile", n = 10)

#quants$brks


D60pct <- Dtot |>
  dplyr::filter(log(as.vector(Dsum))>qvec["60%","logD"]) |>
  dplyr::rename("60th" = "Include")

D65pct <- Dtot |>
  dplyr::filter(log(as.vector(Dsum))>qvec["65%","logD"]) |>
  dplyr::rename("65th" = "Include")

D70pct <- Dtot |>
  dplyr::filter(log(as.vector(Dsum))>qvec["70%","logD"]) |>
  dplyr::rename("70th" = "Include")

D75pct <- Dtot |>
  dplyr::filter(log(as.vector(Dsum))>qvec["75%","logD"]) |>
  dplyr::rename("75th" = "Include")

D80pct <- Dtot |>
  dplyr::filter(log(as.vector(Dsum))>qvec["80%","logD"]) |>
  dplyr::rename("80th" = "Include")

qvec
##     Include     logD
## 60%       1 3.274264
## 65%       1 3.828164
## 70%       1 4.554015
## 75%       1 5.215182
## 80%       1 5.668452

Mapping the summed density below.

d.name <- moddirs[str_detect(moddirs, "herringlarvae_sepfeb")]
fit <- readRDS(paste0(d.name, "/fit.rds"))
#fit <- VAST::reload_model(fit) #added to try to make work after restart, no previous VAST run
years <- unique(fit$data_frame$t_i)
years <- c(min(years):max(years))

mdl <- FishStatsUtils::make_map_info(Region = fit$settings$Region,
                     spatial_list = fit$spatial_list,
                     Extrapolation_List = fit$extrapolation_list#,
                     #added to try to make work after restart, no previous VAST run
                     #Include = fit$extrapolation_list[["Area_km2_x"]] > 0 &  rowSums(fit$extrapolation_list[["a_el"]]) >  0
                     )

gmap <- ggplot(data = ecodata::coast) +
  geom_sf() + 
  #aes(x = lon, y = lat, group = group) +
  #geom_polygon(fill="black", colour = "white") +
  scale_color_viridis_c(option = "magma") +  # now make this quantiles...
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.title.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        panel.spacing.x=unit(0, "lines"),
        panel.spacing.y=unit(0, "lines"),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank() ) +
  coord_sf(xlim=mdl$Xlim, ylim=mdl$Ylim)

g <- gmap +
  geom_point(data=Dtot, aes(Lon, Lat, color=log(as.vector(Dsum)), group=NULL)
             ,
             ## These settings are necessary to avoid
             ## overlplotting which is a problem here. May need
             ## to be tweaked further.
             size=1.5, stroke=0,shape=16
             ) 

g

Mapping the quantiles from 60% (white) to 80% (light blue).

Quantiles below 70% include densities south of Long Island and off Cape Hatteras. The WG thinks the larvae found to the south may be lost to the population so we likely don’t want to use a quantile below the 70th percentile.

The 80th percentile starts to show a gap on Georges Bank and along the southwest coast of Maine. This may be slicing things too finely to cover general herring larval habitat.

I think this leaves us with using either the 70th (light green) or 75th (dark green) percentile of summed density over all years to define herring larvae relevant habitat for the small copepods index.

gq <- gmap +
 geom_point(data=D60pct, 
             aes(x=Lon, y=Lat, color = "60th"), 
            # color="white"
            #)
            # ,
             ## These settings are necessary to avoid
             ## overlplotting which is a problem here. May need
             ## to be tweaked further.
             size=1.5, stroke=0,shape=16
             )  +
  
  geom_point(data=D65pct, 
             aes(x=Lon, y=Lat, color = "65th"), 
             #color="yellow"
             #)
             #,
             ## These settings are necessary to avoid
             ## overlplotting which is a problem here. May need
             ## to be tweaked further.
             size=1.5, stroke=0,shape=16
             )  +
  
  geom_point(data=D70pct, 
             aes(x=Lon, y=Lat, color = "70th"), 
             #color="green"
             #)
             #,
             ## These settings are necessary to avoid
             ## overlplotting which is a problem here. May need
             ## to be tweaked further.
             size=1.5, stroke=0,shape=16
             )  +
  
  geom_point(data=D75pct, 
             aes(x=Lon, y=Lat, color = "75th"), 
             #color="darkgreen"
              #)
             #,
             ## These settings are necessary to avoid
             ## overlplotting which is a problem here. May need
             ## to be tweaked further.
             size=1.5, stroke=0,shape=16
             )  +
  
   geom_point(data=D80pct, 
             aes(x=Lon, y=Lat, color = "80th"), 
             #color="lightblue"
             #)
             #,
             ## These settings are necessary to avoid
             ## overlplotting which is a problem here. May need
             ## to be tweaked further.
             size=1.5, stroke=0,shape=16
             )  +
  
  ggplot2::scale_color_manual(name = "Quantile",
                        breaks = c("60th", "65th", "70th", "75th", "80th"),
                        values = c("60th" = "white", "65th" = "yellow", 
                                   "70th" = "green", "75th" = "darkgreen", 
                                   "80th" = "lightblue") )
  
gq + ggplot2::theme_grey()

# another option, pull from fit$report Omega 1 and 2?

We selected the 70th percentile as the largest that did not include areas in the southern portion of the range to define the “herring larval area” for use in other zooplankton VAST models.

To make a new extrapolation grid using herring larval density 70th percentile, we first make the 70th percentile+ points into an sf object by outlining the “concave hull” of the area, then intersect that object with the built in FishStatsUtils::northwest_atlantic_grid.

Next, we define new strata based on that intersection and make a new extrapolation grid. Then we can use this grid and call the new strata as strata.limits when running the small copepods model.

See the full process here: https://noaa-edab.github.io/zooplanktonindex/HerringLarvaeExplore.html

Using the new grid, run the smallcopeALL_sepfeb dataset in VAST to get an index in the herring larvae area as well as in the others.

Script for doing this is https://github.com/NOAA-EDAB/zooplanktonindex/blob/main/VASTscripts/VASTunivariate_zoopindex_smcopeALL_herrlarvarea.R

3 Results

3.1 Sampling during each season

Sample sizes in each of the temporal categories are reported below. Definitions: “all” = January-December; “FALL” = July-December; “SPRING” = January-June; “fall” = September-November; “spring” = March-May; “summer” = June-August; “winter” = December-February; “nolarv” = March-August; “sepfeb” = September-February of the following year.

herringfood_stn <- readRDS(here::here("data/herringfood_stn_all_OISST.rds"))

# make SST column that uses surftemp unless missing or 0

herringfood_stn <- herringfood_stn |>
  dplyr::mutate(sstfill = ifelse((is.na(sfc_temp)|sfc_temp==0), oisst, sfc_temp),
                season_larv = month %in% c(1:2, 9:12)) |>
  dplyr::filter(year>1981)

alltot <- herringfood_stn |> 
  dplyr::summarise(totstn = n(),
                   across(c(calfin_100m3,
                            lgcopeALL_100m3,
                            smallcopeALL_100m3,
                            cluhar_100m3,
                            euph_100m3,
                            volume_100m3
                            ), ~ sum(.x != 0, na.rm = TRUE))) |>
  dplyr::mutate(season = "all")


seasontot <- herringfood_stn |> 
  dplyr::group_by(season = season_ng) |>
  dplyr::summarise(totstn = n(),
                   across(c(calfin_100m3,
                            lgcopeALL_100m3,
                            smallcopeALL_100m3,
                            cluhar_100m3,
                            euph_100m3,
                            volume_100m3
                            ), ~ sum(.x != 0, na.rm = TRUE)))

season4tot <- herringfood_stn |> 
  dplyr::group_by(season = season_4) |>
  dplyr::summarise(totstn = n(),
                   across(c(calfin_100m3,
                            lgcopeALL_100m3,
                            smallcopeALL_100m3,
                            cluhar_100m3,
                            euph_100m3,
                            volume_100m3
                            ), ~ sum(.x != 0, na.rm = TRUE)))

larvtot <- herringfood_stn |> 
  dplyr::mutate(season = dplyr::if_else(season_larv, "sepfeb", "nolarv")) |>
  dplyr::group_by(season) |>
  dplyr::summarise(totstn = n(),
                   across(c(calfin_100m3,
                            lgcopeALL_100m3,
                            smallcopeALL_100m3,
                            cluhar_100m3,
                            euph_100m3,
                            volume_100m3
                            ), ~ sum(.x != 0, na.rm = TRUE)))

samptab <- dplyr::bind_rows(alltot, seasontot, season4tot, larvtot) |>
  dplyr::select(season, 
                stations = totstn,
                zoopvol = volume_100m3,
                largecope = lgcopeALL_100m3,
                smallcope = smallcopeALL_100m3,
                herringlarv = cluhar_100m3,
                calanusfin = calfin_100m3,
                euphausid = euph_100m3)
  
flextable::flextable(samptab) |>
  flextable::set_caption("Number of survey stations 1982-2022 by season with number of postive samples by zooplankton species group.") |>
  flextable::colformat_double(big.mark = "", digits = 0) |>
  #flextable::autofit()
  flextable::width(width = 1)
``` ```{=html}
(\#tab:unnamed-chunk-6)Number of survey stations 1982-2022 by season with number of postive samples by zooplankton species group.

season

stations

zoopvol

largecope

smallcope

herringlarv

calanusfin

euphausid

all

27,774

25,266

23,813

25,255

3,815

21,524

11,710

FALL

13,249

12,935

12,081

12,946

2,765

10,355

5,906

SPRING

14,525

12,331

11,732

12,309

1,050

11,169

5,804

fall

8,745

8,558

8,012

8,557

2,134

6,526

4,154

spring

7,739

6,549

6,176

6,540

227

5,972

3,360

summer

5,810

5,471

5,134

5,488

25

4,939

2,566

winter

5,480

4,688

4,491

4,670

1,429

4,087

1,630

nolarv

13,549

12,020

11,310

12,028

252

10,911

5,926

sepfeb

14,225

13,246

12,503

13,227

3,563

10,613

5,784

3.1.1 Zooplankton stations by season

zoopvol <- moddirs[str_detect(moddirs, "zoopvol")]
zoopvolnocovs <- zoopvol[!str_detect(zoopvol, "_doy")]

for(d.name in zoopvolnocovs){
  
  modpath <- unlist(str_split(d.name, pattern = "/"))
  modname <- modpath[length(modpath)]
  
  cat(modname, "\n")
  cat(paste0("![](",d.name, "/Data_by_year.png)"))
  cat("\n\n")
  
}

zoopvol_ann_500_biascorrect

zoopvol_fall_500_biascorrect

zoopvol_spring_500_biascorrect

3.1.2 Herring larvae stations by season

Fall sampling for herring larvae was completed in most years aside from GLOBEC. In our definition of spring, herring larvae primarily occur in January and February. We now include a shifted season to better match larval herring availability throughout the time series: September - February. Year corresponds to September-December, and the following January and February are aligned with the previous year (hatch year) in these analyses. Data were missing for

for(d.name in moddirs[str_detect(moddirs, "herring")]){
  
  modpath <- unlist(str_split(d.name, pattern = "/"))
  modname <- modpath[length(modpath)]
  
  cat(modname, "\n")
  cat(paste0("![](",d.name, "/Data_by_year.png)"))
  cat("\n\n")
  
}

herringlarvae_fall_500_biascorrect

herringlarvae_sepfeb_yrshift_500_biascorrect

herringlarvae_spring_500_biascorrect

3.2 Model validation

We compared a preliminary annual scale VAST Calanus finmarchicus model output (no covariates or bias correction) with the annual anomalies currently included in the R ecodata package (overview, methods). Estimation methods are completely different and spatial strata are slightly different, but the timing of high and low anomalies is well-aligned between the two methods for the region mainly occupied by both Atlantic herring and Calanus finmarchicus, Georges Bank (GB) and the Gulf of Maine (GOM).

stratlook <- data.frame(Stratum = c("Stratum_1",
                                    "Stratum_2",
                                    "Stratum_3",
                                    "Stratum_4",
                                    "Stratum_5",
                                    "Stratum_6",
                                    "Stratum_7"),
                        Region  = c("AllEPU",
                                    "her_sp",
                                    "her_fa",
                                    "MAB",
                                    "GB",
                                    "GOM",
                                    "SS"))

stratlook2 <- data.frame(Stratum = c("Stratum_1",
                                    "Stratum_2",
                                    "Stratum_3",
                                    "Stratum_4",
                                    "Stratum_5",
                                    "Stratum_6",
                                    "Stratum_7",
                                    "Stratum_8",
                                    "Stratum_9"),
                        Region  = c("AllEPU",
                                    "her_sp",
                                    "her_fa",
                                    "her_larv",
                                    "no_larv",
                                    "MAB",
                                    "GB",
                                    "GOM",
                                    "SS"))

source(here::here("R/utils_VASToutput.R")) #IEA/HerringRT_2025/

outdir <- here::here("pyindex_tests") #IEA/HerringRT_2025/
moddirs <- list.dirs(outdir) 
moddirs <- moddirs[-1]
# keep folder name
modnames <- list.dirs(outdir, full.names = FALSE)

modselect <- modtable(moddirs)

# lets only look at indices for converged models
modcompare_conv <- modselect |>
  dplyr::ungroup() |>
  dplyr::filter(converged2 == "likely") |>
  dplyr::select(modname) |>
  as.vector() |>
  unname() |>
  unlist()

moddirs_conv <-  moddirs[grepl(sprintf("\\.*(%s)$", paste(modcompare_conv, collapse = '|')), moddirs)]


modcompareindex <- purrr::map_dfr(moddirs_conv, purrr::possibly(getmodindex, otherwise = NULL))

splitoutput <- modcompareindex %>%
  dplyr::mutate(Season = modname |> map(str_split, pattern = "_") |> map_chr(c(1,2))) %>%
  dplyr::mutate(Data = modname |> map(str_split, pattern = "_") |> map_chr(c(1,1))) %>%
  dplyr::mutate(Estimate = ifelse(Estimate == 0, NA, Estimate)) |>
  dplyr::left_join(stratlook)

# only larvarea models have this strata set
splitoutput2 <- modcompareindex %>%
  dplyr::filter(str_detect(modname, "larvarea")) |>
  dplyr::mutate(Season = modname |> map(str_split, pattern = "_") |> map_chr(c(1,2))) %>%
  dplyr::mutate(Data = modname |> map(str_split, pattern = "_") |> map_chr(c(1,1))) %>%
  dplyr::mutate(Estimate = ifelse(Estimate == 0, NA, Estimate)) |>
  dplyr::left_join(stratlook2)

zoomax <- max(splitoutput$Estimate, na.rm=T)

zootsmean <- splitoutput %>%
  dplyr::group_by(modname, Region) %>%
  dplyr::mutate(fmean = mean(Estimate, na.rm=T)) 
VASTmean <- splitoutput |>
    dplyr::filter(Data %in% c("calfin"),
                  Season %in% c("annual"),
                  Region %in% c("MAB", "GB", "GOM")) |>
    dplyr::group_by(Region, Season) |>
    dplyr::summarise(tsmean = mean(Estimate))

VASTanom <- splitoutput |>
    dplyr::filter(Data %in% c("calfin"),
                  Season %in% c("annual"),
                  Region %in% c("MAB", "GB", "GOM")) |>
    dplyr::left_join(VASTmean) |>
    dplyr::group_by(Region, Season) |>
    dplyr::mutate(#Value = Value/resca,
      Mean = as.numeric(Estimate),
      SE = Std..Error.for.Estimate,
      Mean = (Mean-tsmean)/tsmean,
      SE = (SE-tsmean)/tsmean,
      Upper = Mean + SE,
      Lower = Mean - SE) |>
  dplyr::rename(EPU = Region)

SOEcalfin <- ecodata::zoo_abundance_anom |>
  dplyr::mutate(Value = as.numeric(Value)) |>
  dplyr::filter(Var == "Calfin") |>
  ggplot2::ggplot(aes(x = Time, y = Value)) +
  ggplot2::geom_point(color="darkred") + 
  ggplot2::geom_line(color="darkred") + 
  ecodata::geom_gls() +
  ggplot2::geom_line(data = VASTanom, aes(x=Time, y=Mean), color = "blue", size=1.5) +
  ecodata::theme_facet() +
  ggplot2::facet_wrap(~EPU)

SOEcalfin + ggplot2::ggtitle("Calanus finmarchicus annual anomaly: ecodata in dark red; VAST in blue")

GB = Georges Bank, GOM = Gulf of Maine, MAB = Mid Atlantic Bight.

Comparison of spatial definitions: ecodata anomalies use the dark red line while VAST models use the blue shaded polygons.

url <- "https://github.com/NOAA-EDAB/presentations/raw/master/docs/EDAB_images/EPU_Designations_Map.jpg"
knitr::include_graphics(url)

3.3 Model selection Stage 1 results

# from each output folder in pyindex, 
outdir <- here::here("pyindex_modsel1")
moddirs <- list.dirs(outdir) 
moddirs <- moddirs[-1]
# keep folder name
modnames <- list.dirs(outdir, full.names = FALSE)


# function to apply extracting info
getmodinfo <- function(d.name){
  # read settings
  modpath <- stringr::str_split(d.name, "/", simplify = TRUE)
  modname <- modpath[length(modpath)]
  
  settings <- read.table(file.path(d.name, "settings.txt"), comment.char = "",
    fill = TRUE, header = FALSE)
  
  n_x <- as.numeric(as.character(settings[(which(settings[,1]=="$n_x")+1),2]))
  grid_size_km <- as.numeric(as.character(settings[(which(settings[,1]=="$grid_size_km")+1),2]))
  max_cells <- as.numeric(as.character(settings[(which(settings[,1]=="$max_cells")+1),2]))
  use_anisotropy <- as.character(settings[(which(settings[,1]=="$use_anisotropy")+1),2])
  fine_scale <- as.character(settings[(which(settings[,1]=="$fine_scale")+1),2])
  bias.correct <- as.character(settings[(which(settings[,1]=="$bias.correct")+1),2])
  
  #FieldConfig
  if(settings[(which(settings[,1]=="$FieldConfig")+1),1]=="Component_1"){
    omega1 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+2),2])
    omega2 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+3),1])
    epsilon1 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+4),2])
    epsilon2 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+5),1])
    beta1 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+6),2])
    beta2 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+7),1])
  }
  
  if(settings[(which(settings[,1]=="$FieldConfig")+1),1]=="Omega1"){
    omega1 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+3),1])
    omega2 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+4),1])
    epsilon1 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+3),2])
    epsilon2 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+4),2])
    beta1 <- "IID"
    beta2 <- "IID"
  }
  
  
  #RhoConfig
  rho_beta1 <- as.numeric(as.character(settings[(which(settings[,1]=="$RhoConfig")+3),1]))
  rho_beta2 <- as.numeric(as.character(settings[(which(settings[,1]=="$RhoConfig")+3),2]))
  rho_epsilon1 <- as.numeric(as.character(settings[(which(settings[,1]=="$RhoConfig")+4),1]))
  rho_epsilon2 <- as.numeric(as.character(settings[(which(settings[,1]=="$RhoConfig")+4),2]))
  
  # read parameter estimates, object is called parameter_Estimates
  if(file.exists(file.path(d.name, "parameter_estimates.RData"))) {
    load(file.path(d.name, "parameter_estimates.RData"))
    
    AIC <- parameter_estimates$AIC[1]  
    converged <- parameter_estimates$Convergence_check[1]
    fixedcoeff <- unname(parameter_estimates$number_of_coefficients[2])
    randomcoeff <- unname(parameter_estimates$number_of_coefficients[3])
    
  }else if(file.exists(file.path(d.name, "parameter_estimates.txt"))){
    
    reptext <- readLines(file.path(d.name, "parameter_estimates.txt"))
    
    AIC <- as.double(reptext[grep(reptext, pattern = "AIC")+1])
    converged <- reptext[grep(reptext, pattern = "Convergence_check")+1]
    fixedcoeff <- as.integer(stringr::str_split(reptext[grep(reptext, pattern = "number_of_coefficients")+2], 
                                     boundary("word"))[[1]][2])
    randomcoeff <- as.integer(stringr::str_split(reptext[grep(reptext, pattern = "number_of_coefficients")+2], 
                                     boundary("word"))[[1]][3])
    
  }else{
    
    AIC <- NA_real_
    converged <- NA_character_
    fixedcoeff <- NA_integer_
    randomcoeff <- NA_integer_
  }
  
  #index <- read.csv(file.path(d.name, "Index.csv"))
  
  
  # return model attributes as a dataframe
  out <- data.frame(modname = modname,
                    n_x = n_x,
                    grid_size_km = grid_size_km,
                    max_cells = max_cells,
                    use_anisotropy = use_anisotropy,
                    fine_scale =  fine_scale,
                    bias.correct = bias.correct,
                    omega1 = omega1,
                    omega2 = omega2,
                    epsilon1 = epsilon1,
                    epsilon2 = epsilon2,
                    beta1 = beta1,
                    beta2 = beta2,
                    rho_epsilon1 = rho_epsilon1,
                    rho_epsilon2 = rho_epsilon2,
                    rho_beta1 = rho_beta1,
                    rho_beta2 = rho_beta2,
                    AIC = AIC,
                    converged = converged,
                    fixedcoeff = fixedcoeff,
                    randomcoeff = randomcoeff#,
                    #index = index
  )
    
    return(out)

}


modcompare <- purrr::map_dfr(moddirs, getmodinfo)

Models compared using REML are identified by model name (“modname” in Table Sb??) which always includes all prey aggregated, season (“all” for annual models of months 1-12, “fall” for models of months 7-12, and “spring” for models of months 1-6), number of knots (500 for all models), and which fixed and random spatial and spatio-temporal effects were included in which linear predictor (1 or 2). The names for model options and associated VAST model settings are:

Model selection 1 (spatial, spatio-temporal effects, no covariates) options (following Ng et al. (2021)) and naming: * All models set Use_REML = TRUE in fit_model specifications.
* Modeled effects, model name suffix, and VAST settings by model:

  1. “_alleffectson” = Spatial and spatio-temporal random effects plus anisotropy in both linear predictors; FieldConfig default (all IID)
  2. “_noaniso” = Spatial and spatio-temporal random effects in both linear predictors with anisotopy turned off; FieldConfig default (all IID) and use_anisotopy = FALSE
  3. “_noomeps2” = Spatial and spatio-temporal random effects plus anisotropy only in linear predictor 1; FieldConfig = 0 for Omega2, Epsilon2
  4. “_noomeps2_noaniso” = Spatial and spatio-temporal random effects only in linear predictor 1 with anisotopy turned off; FieldConfig = 0 for Omega2, Epsilon2 and use_anisotopy = FALSE
  5. “_noomeps2_noeps1” = Spatial random effects plus anisotropy only in linear predictor 1; FieldConfig = 0 for Omega2, Epsilon2, Epsilon1
  6. “_noomeps2_noeps1_noaniso” = Spatial random effects only in linear predictor 1 with anisotopy turned off; FieldConfig = 0 for Omega2, Epsilon2, Epsilon1 and use_anisotopy = FALSE
  7. “_noomeps12” = Anisotropy, but no spatial or spatio-temporal random effects in either linear predictor; FieldConfig = 0 for both Omega, Epsilon
  8. “_noomeps12_noaniso” = No spatial or spatio-temporal random effects in either linear predictor; FieldConfig = 0 for both Omega, Epsilon and use_anisotopy = FALSE
modselect <- modcompare |>
  dplyr::mutate(season = dplyr::case_when(stringr::str_detect(modname, "_fall_") ~ "Fall",
                            stringr::str_detect(modname, "spring") ~ "Spring",
                            stringr::str_detect(modname, "_all_") ~ "Annual",
                            TRUE ~ as.character(NA))) |>
  dplyr::mutate(converged2 = dplyr::case_when(stringr::str_detect(converged, "no evidence") ~ "likely",
                                stringr::str_detect(converged, "is likely not") ~ "unlikely",
                                TRUE ~ as.character(NA))) |>
  dplyr::mutate(copegroup = stringr::str_extract(modname, "[^_]+")) |>
  #dplyr::mutate(modname = str_extract(modname, '(?<=allagg_).*')) |>
  dplyr::group_by(copegroup, season) |>
  dplyr::mutate(deltaAIC = AIC-min(AIC)) |>
  dplyr::select(copegroup, modname, season, deltaAIC, fixedcoeff,
         randomcoeff, use_anisotropy, 
         omega1, omega2, epsilon1, epsilon2, 
         beta1, beta2, AIC, converged2) |>
  dplyr::arrange(copegroup, season, AIC)

# DT::datatable(modselect, rownames = FALSE, 
#               options= list(pageLength = 25, scrollX = TRUE),
#               caption = "Comparison of delta AIC values using Restricted Maxiumum Likelihood (REML) for alternative fixed and random effects model structures. See text for model descriptions.")

flextable::flextable(modselect |>
                       dplyr::select(-c(use_anisotropy, 
         omega1, omega2, epsilon1, epsilon2, 
         beta1, beta2))) |>
  flextable::set_header_labels(copegroup = "Copepod group",
                               modname = "Model name",
                               season = "Season",
                               deltaAIC = "dAIC",
                               fixedcoeff = "N fixed",
                               randomcoeff = "N random",
                               converged2 = "Convergence") |>
  flextable::set_caption("Comparison of delta AIC (dAIC) values using Restricted Maxiumum Likelihood (REML) for alternative fixed and random effects model structures, with number of fixed (N fixed) and random (N random) coefficients. See text for model descriptions associated with each model name.") |>
  flextable::fontsize(size = 9, part = "all") |>
  flextable::colformat_double(digits = 2) |>
  flextable::set_table_properties(layout = "autofit", width = 1)
``` ```{=html}
(\#tab:modsel1)Comparison of delta AIC (dAIC) values using Restricted Maxiumum Likelihood (REML) for alternative fixed and random effects model structures, with number of fixed (N fixed) and random (N random) coefficients. See text for model descriptions associated with each model name.

Copepod group

Model name

Season

dAIC

N fixed

N random

AIC

Convergence

calfin

calfin_fall_500_alleffectson

Fall

0.00

9

46,616

208,373.00

likely

calfin

calfin_fall_500_noaniso

Fall

79.25

7

46,616

208,452.25

likely

calfin

calfin_fall_500_noomeps2

Fall

1,753.04

6

23,348

210,126.05

likely

calfin

calfin_fall_500_noomeps2_noaniso

Fall

1,814.90

4

23,348

210,187.90

likely

calfin

calfin_fall_500_noomeps2_noeps1

Fall

3,474.74

5

634

211,847.75

likely

calfin

calfin_fall_500_noomeps2_noeps1_noaniso

Fall

3,501.57

3

634

211,874.57

likely

calfin

calfin_fall_500_noomeps12

Fall

9,589.69

1

80

217,962.69

likely

calfin

calfin_fall_500_noomeps12_noaniso

Fall

9,589.69

1

80

217,962.69

likely

calfin

calfin_spring_500_alleffectson

Spring

0.00

9

46,534

239,281.74

likely

calfin

calfin_spring_500_noaniso

Spring

48.35

7

46,534

239,330.09

likely

calfin

calfin_spring_500_noomeps2

Spring

864.36

6

23,308

240,146.10

likely

calfin

calfin_spring_500_noomeps2_noaniso

Spring

889.01

4

23,308

240,170.75

likely

calfin

calfin_spring_500_noomeps2_noeps1

Spring

2,325.90

5

635

241,607.64

likely

calfin

calfin_spring_500_noomeps2_noeps1_noaniso

Spring

2,339.84

3

635

241,621.59

likely

calfin

calfin_spring_500_noomeps12

Spring

6,517.46

1

82

245,799.20

likely

calfin

calfin_spring_500_noomeps12_noaniso

Spring

6,517.46

1

82

245,799.20

likely

lgcopeALL

lgcopeALL_fall_500_alleffectson

Fall

0.00

9

46,616

251,030.89

likely

lgcopeALL

lgcopeALL_fall_500_noaniso

Fall

100.03

7

46,616

251,130.92

likely

lgcopeALL

lgcopeALL_fall_500_noomeps2

Fall

1,496.48

6

23,348

252,527.36

likely

lgcopeALL

lgcopeALL_fall_500_noomeps2_noaniso

Fall

1,588.70

4

23,348

252,619.58

likely

lgcopeALL

lgcopeALL_fall_500_noomeps2_noeps1

Fall

3,572.64

5

634

254,603.53

likely

lgcopeALL

lgcopeALL_fall_500_noomeps2_noeps1_noaniso

Fall

3,610.16

3

634

254,641.04

likely

lgcopeALL

lgcopeALL_fall_500_noomeps12

Fall

7,346.16

1

80

258,377.05

likely

lgcopeALL

lgcopeALL_fall_500_noomeps12_noaniso

Fall

7,346.16

1

80

258,377.05

likely

lgcopeALL

lgcopeALL_spring_500_alleffectson

Spring

0.00

9

46,450

257,499.63

likely

lgcopeALL

lgcopeALL_spring_500_noaniso

Spring

78.55

7

46,450

257,578.17

likely

lgcopeALL

lgcopeALL_spring_500_noomeps2

Spring

1,746.74

6

23,266

259,246.36

likely

lgcopeALL

lgcopeALL_spring_500_noomeps2_noaniso

Spring

1,798.07

4

23,266

259,297.70

likely

lgcopeALL

lgcopeALL_spring_500_noomeps2_noeps1

Spring

4,612.34

5

634

262,111.96

likely

lgcopeALL

lgcopeALL_spring_500_noomeps2_noeps1_noaniso

Spring

4,622.66

3

634

262,122.28

likely

lgcopeALL

lgcopeALL_spring_500_noomeps12

Spring

7,906.90

1

82

265,406.53

likely

lgcopeALL

lgcopeALL_spring_500_noomeps12_noaniso

Spring

7,906.90

1

82

265,406.53

likely

smallcopeALL

smallcopeALL_fall_500_alleffectson

Fall

0.00

9

46,607

303,280.90

likely

smallcopeALL

smallcopeALL_fall_500_noaniso

Fall

47.18

7

46,607

303,328.07

likely

smallcopeALL

smallcopeALL_fall_500_noomeps2

Fall

2,231.62

6

23,339

305,512.51

likely

smallcopeALL

smallcopeALL_fall_500_noomeps2_noaniso

Fall

2,265.50

4

23,339

305,546.39

likely

smallcopeALL

smallcopeALL_fall_500_noomeps2_noeps1

Fall

4,501.12

5

625

307,782.01

likely

smallcopeALL

smallcopeALL_fall_500_noomeps2_noeps1_noaniso

Fall

4,508.11

3

625

307,789.01

likely

smallcopeALL

smallcopeALL_fall_500_noomeps12

Fall

8,563.21

1

71

311,844.11

likely

smallcopeALL

smallcopeALL_fall_500_noomeps12_noaniso

Fall

8,563.21

1

71

311,844.11

likely

smallcopeALL

smallcopeALL_spring_500_alleffectson

Spring

0.00

9

46,441

282,327.03

likely

smallcopeALL

smallcopeALL_spring_500_noaniso

Spring

56.67

7

46,441

282,383.71

likely

smallcopeALL

smallcopeALL_spring_500_noomeps2

Spring

3,961.69

6

23,257

286,288.72

likely

smallcopeALL

smallcopeALL_spring_500_noomeps2_noaniso

Spring

4,004.70

4

23,257

286,331.73

likely

smallcopeALL

smallcopeALL_spring_500_noomeps2_noeps1_noaniso

Spring

7,586.19

3

625

289,913.22

likely

smallcopeALL

smallcopeALL_spring_500_noomeps2_noeps1

Spring

7,586.74

5

625

289,913.78

likely

smallcopeALL

smallcopeALL_spring_500_noomeps12

Spring

13,390.23

1

73

295,717.26

likely

smallcopeALL

smallcopeALL_spring_500_noomeps12_noaniso

Spring

13,390.23

1

73

295,717.26

likely

smallcopeSOE

smallcopeSOE_fall_500_alleffectson

Fall

0.00

9

46,616

290,760.56

likely

smallcopeSOE

smallcopeSOE_fall_500_noaniso

Fall

61.29

7

46,616

290,821.85

likely

smallcopeSOE

smallcopeSOE_fall_500_noomeps2

Fall

2,338.60

6

23,348

293,099.16

likely

smallcopeSOE

smallcopeSOE_fall_500_noomeps2_noaniso

Fall

2,380.30

4

23,348

293,140.86

likely

smallcopeSOE

smallcopeSOE_fall_500_noomeps2_noeps1

Fall

4,708.61

5

634

295,469.17

likely

smallcopeSOE

smallcopeSOE_fall_500_noomeps2_noeps1_noaniso

Fall

4,720.67

3

634

295,481.22

likely

smallcopeSOE

smallcopeSOE_fall_500_noomeps12

Fall

8,233.60

1

80

298,994.16

likely

smallcopeSOE

smallcopeSOE_fall_500_noomeps12_noaniso

Fall

8,233.60

1

80

298,994.16

likely

smallcopeSOE

smallcopeSOE_spring_500_alleffectson

Spring

0.00

9

46,445

278,928.98

likely

smallcopeSOE

smallcopeSOE_spring_500_noaniso

Spring

72.09

7

46,445

279,001.07

likely

smallcopeSOE

smallcopeSOE_spring_500_noomeps2

Spring

4,100.24

6

23,261

283,029.22

likely

smallcopeSOE

smallcopeSOE_spring_500_noomeps2_noaniso

Spring

4,157.09

4

23,261

283,086.06

likely

smallcopeSOE

smallcopeSOE_spring_500_noomeps2_noeps1_noaniso

Spring

7,636.58

3

629

286,565.55

likely

smallcopeSOE

smallcopeSOE_spring_500_noomeps2_noeps1

Spring

7,638.62

5

629

286,567.60

likely

smallcopeSOE

smallcopeSOE_spring_500_noomeps12

Spring

12,907.54

1

77

291,836.51

likely

smallcopeSOE

smallcopeSOE_spring_500_noomeps12_noaniso

Spring

12,907.54

1

77

291,836.51

likely

Using REML, models including spatial and spatio-temporal random effects as well as anisotropy were best supported by the data. This was true for the spring datasets and the fall datasets for calfin, lgcopeALL, smallcopeALL, and smallcopeSOE.

However, the small copepod fall model had the spatial random effects parameter for encounter rate, \(\omega_1\), approach 0. We allowed it to be estimated at 0 rather than fixing it at 0. Resulting indices were identical.

Full outputs from stage 1 model selection are posted to google drive here.

3.4 Model selection stage 2 results

Including the day of year covariate had mixed success. While some models fit better with this covariate, others did not meet the convergence criteria. Therefore, we moved forward with the best fitting model that converged to generate indices.

Full results from the best fitting models are posted to google drive here

Only spring large zooplankton did better with a day of year catchability covariate among the models related to herring recruitment.

# from each output folder in pyindex, 
outdir <- here::here("pyindex")
moddirs <- list.dirs(outdir) 
moddirs <- moddirs[-1]
# keep folder name
modnames <- list.dirs(outdir, full.names = FALSE)

modcompare <- purrr::map_dfr(moddirs, getmodinfo)

modselect <- modcompare |>
  dplyr::mutate(season = dplyr::case_when(stringr::str_detect(modname, "_fall_") ~ "Fall",
                            stringr::str_detect(modname, "spring") ~ "Spring",
                            stringr::str_detect(modname, "_ann_") ~ "Annual",
                            stringr::str_detect(modname, "sepfeb") ~ "SepFeb",
                            TRUE ~ as.character(NA))) |>
  dplyr::mutate(converged2 = dplyr::case_when(stringr::str_detect(converged, "no evidence") ~ "likely",
                                stringr::str_detect(converged, "is likely not") ~ "unlikely",
                                TRUE ~ as.character(NA))) |>
  dplyr::mutate(copegroup = stringr::str_extract(modname, "[^_]+")) |>
  #dplyr::mutate(modname = str_extract(modname, '(?<=allagg_).*')) |>
  dplyr::group_by(copegroup, season) |>
  dplyr::mutate(deltaAIC = AIC-min(AIC)) |>
  dplyr::select(copegroup, modname, season, deltaAIC, fixedcoeff,
         randomcoeff, 
         omega1, omega2, #epsilon1, epsilon2, 
         #beta1, beta2, use_anisotropy, 
         AIC, converged2) |>
  dplyr::arrange(copegroup, season, AIC)

flextable::flextable(modselect) %>%
                       #dplyr::select(-c(use_anisotropy, 
         #omega1, omega2, epsilon1, epsilon2, 
         #beta1, beta2))
         #) %>%
  flextable::set_header_labels(modname = "Model name",
                               season = "Season",
                               #deltaAIC = "dAIC",
                               fixedcoeff = "N fixed",
                               randomcoeff = "N random",
                               omega2 = "All other effects",
                               converged2 = "Convergence") |>
  #flextable::set_caption("Comparison of delta AIC (dAIC) values using Restricted Maxiumum Likelihood (REML) for alternative fixed and random effects model structures, with number of fixed (N fixed) and random (N random) coefficients. See text for model descriptions associated with each model name.") %>%
  flextable::fontsize(size = 9, part = "all") %>%
  flextable::colformat_double(digits = 2) |>
  flextable::set_table_properties(layout = "autofit", width = 1)

copegroup

Model name

Season

deltaAIC

N fixed

N random

omega1

All other effects

AIC

Convergence

calfin

calfin_ann_500_biascorrect_doy

Annual

0.00

93

46,536

IID

IID

453,074.91

unlikely

calfin

calfin_fall_500_biascorrect_doy

Fall

0.00

91

46,536

IID

IID

206,053.34

unlikely

calfin

calfin_fall_500_biascorrect

Fall

2,405.54

89

46,536

IID

IID

208,458.88

likely

calfin

calfin_spring_500_biascorrect

Spring

91

46,452

IID

IID

239,403.84

likely

calfin

calfin_spring_500_biascorrect_doy

Spring

IID

IID

euph

euph_ann_500_biascorrect_doy

Annual

0.00

93

46,536

IID

IID

205,163.31

unlikely

euph

euph_ann_500_biascorrect

Annual

45.86

91

46,536

IID

IID

205,209.17

likely

euph

euph_fall_500_biascorrect_doy

Fall

0.00

91

46,536

IID

IID

100,475.13

unlikely

euph

euph_fall_500_biascorrect

Fall

96.55

89

46,536

IID

IID

100,571.67

likely

euph

euph_spring_500_biascorrect_doy

Spring

0.00

93

46,452

IID

IID

101,850.11

likely

euph

euph_spring_500_biascorrect

Spring

819.63

91

46,452

IID

IID

102,669.73

likely

herringlarvae

herringlarvae_fall_500_biascorrect

Fall

0.00

79

46,536

IID

IID

31,298.46

likely

herringlarvae

herringlarvae_sepfeb_yrshift_500_biascorrect

SepFeb

0.00

79

46,620

IID

IID

38,202.32

likely

herringlarvae

herringlarvae_spring_500_biascorrect

Spring

0.00

73

46,368

IID

IID

9,683.77

likely

lgcopeALL

lgcopeALL_ann_500_biascorrect_doy

Annual

0.00

93

46,368

IID

IID

515,161.55

unlikely

lgcopeALL

lgcopeALL_fall_500_biascorrect_doy

Fall

0.00

91

46,536

IID

IID

250,277.08

unlikely

lgcopeALL

lgcopeALL_fall_500_biascorrect

Fall

824.41

89

46,536

IID

IID

251,101.49

likely

lgcopeALL

lgcopeALL_spring_500_biascorrect_doy

Spring

0.00

93

46,368

IID

IID

255,470.75

likely

lgcopeALL

lgcopeALL_spring_500_biascorrect

Spring

2,175.71

91

46,368

IID

IID

257,646.46

likely

smallcopeALL

smallcopeALL_ann_500_biascorrect_doy

Annual

0.00

90

46,368

IID

IID

593,223.14

unlikely

smallcopeALL

smallcopeALL_fall_500_biascorrect_noom1_doy

Fall

81

45,982

0

IID

303,199.76

unlikely

smallcopeALL

smallcopeALL_fall_500_biascorrect_noom1

Fall

79

45,982

0

IID

303,358.72

likely

smallcopeALL

smallcopeALL_fall_500_biascorrect

Fall

80

46,536

IID

IID

303,360.72

likely

smallcopeALL

smallcopeALL_fall_500_biascorrect_fail

Fall

IID

IID

smallcopeALL

smallcopeALL_sepfeb_yrshift_500_larvarea_biascorrect

SepFeb

0.00

84

46,452

IID

IID

306,107.29

likely

smallcopeALL

smallcopeALL_sepfeb_yrshift_500_larvarea_biascorrect_oldarea

SepFeb

8.72

84

46,452

IID

IID

306,116.01

likely

smallcopeALL

smallcopeALL_sepfeb_yrshift_500_biascorrect

SepFeb

18.99

84

46,620

IID

IID

306,126.28

likely

smallcopeALL

smallcopeALL_spring_500_biascorrect_doy

Spring

0.00

84

46,368

IID

IID

281,080.63

unlikely

smallcopeALL

smallcopeALL_spring_500_biascorrect

Spring

1,385.07

82

46,368

IID

IID

282,465.70

likely

smallcopeSOE

smallcopeSOE_ann_500_biascorrect_doy

Annual

0.00

93

46,368

IID

IID

577,717.71

unlikely

smallcopeSOE

smallcopeSOE_fall_500_biascorrect_doy

Fall

0.00

91

46,536

IID

IID

290,576.94

unlikely

smallcopeSOE

smallcopeSOE_fall_500_biascorrect

Fall

307.39

89

46,536

IID

IID

290,884.33

likely

smallcopeSOE

smallcopeSOE_spring_500_biascorrect_doy

Spring

0.00

88

46,368

IID

IID

277,659.94

likely

smallcopeSOE

smallcopeSOE_spring_500_biascorrect

Spring

1,418.56

86

46,368

IID

IID

279,078.51

likely

zoopvol

zoopvol_ann_500_biascorrect_doy

Annual

0.00

52

46,536

IID

IID

226,109.45

unlikely

zoopvol

zoopvol_ann_500_biascorrect

Annual

77.87

50

46,536

IID

IID

226,187.32

likely

zoopvol

zoopvol_fall_500_biascorrect

Fall

49

46,536

IID

IID

111,955.59

likely

zoopvol

zoopvol_fall_500_biascorrect_doy

Fall

IID

IID

zoopvol

zoopvol_spring_500_biascorrect_doy

Spring

0.00

52

46,452

IID

IID

107,636.22

unlikely

zoopvol

zoopvol_spring_500_biascorrect

Spring

2,644.91

50

46,452

IID

IID

110,281.13

likely

3.5 Indices by zooplankton group

Although models were run in multiple seasons and at multiple spatial scales, we focus on results from three indices that were evaluated as covariates in the Atlantic herring assessment model. Based on the results of the boosted regression tree analysis (Molina 2024), the first two indices are large copepods in spring (influencing growth of herring postlarvae and juveniles), and small copeopods in fall (influencing survival of herring larvae over the winter). The spatial scale of these indices is the Atlantic herring bottom trawl survey strata in spring and fall, respectively. The third index is more refined to focus on herring larvae, using temporal and spatial bounds definerd by herring larvae observations in the zooplankton dataset. This third index focuses on small copepods from September-February in the area where more than 70% of herring larval density was identified between 1982-2022.

# lets only look at indices for converged models

stratlook <- data.frame(Stratum = c("Stratum_1",
                                    "Stratum_2",
                                    "Stratum_3",
                                    "Stratum_4",
                                    "Stratum_5",
                                    "Stratum_6",
                                    "Stratum_7"),
                        Region  = c("AllEPU",
                                    "her_sp",
                                    "her_fa",
                                    "MAB",
                                    "GB",
                                    "GOM",
                                    "SS"))

stratlook2 <- data.frame(Stratum = c("Stratum_1",
                                    "Stratum_2",
                                    "Stratum_3",
                                    "Stratum_4",
                                    "Stratum_5",
                                    "Stratum_6",
                                    "Stratum_7",
                                    "Stratum_8",
                                    "Stratum_9"),
                        Region  = c("AllEPU",
                                    "her_sp",
                                    "her_fa",
                                    "her_larv",
                                    "no_larv",
                                    "MAB",
                                    "GB",
                                    "GOM",
                                    "SS"))


modcompare_conv <- modselect |>
  dplyr::ungroup() |>
  dplyr::filter(converged2 == "likely") |>
  dplyr::select(modname) |>
  as.vector() |>
  unname() |>
  unlist()

# or only converged and best AIC
modcompare_best <- modselect |>
  dplyr::ungroup() |>
  dplyr::filter(converged2 == "likely") |>
  dplyr::distinct(copegroup, season, .keep_all = T) |> # assumes orderd by dAIC!
  dplyr::select(modname) |>
  as.vector() |>
  unname() |>
  unlist()

moddirs_conv <-  moddirs[grepl(sprintf("\\.*(%s)$", paste(modcompare_conv, collapse = '|')), moddirs)]

moddirs_best <-  moddirs[grepl(sprintf("\\.*(%s)$", paste(modcompare_best, collapse = '|')), moddirs)]

modcompareindex <- purrr::map_dfr(moddirs_best, purrr::possibly(getmodindex, otherwise = NULL))

splitoutput <- modcompareindex %>%
  dplyr::mutate(Season = modname |> map(str_split, pattern = "_") |> map_chr(c(1,2))) %>%
  dplyr::mutate(Data = modname |> map(str_split, pattern = "_") |> map_chr(c(1,1))) %>%
  dplyr::mutate(Estimate = ifelse(Estimate == 0, NA, Estimate)) |>
  dplyr::left_join(stratlook)

# only larvarea models have this strata set
splitoutput2 <- modcompareindex %>%
  dplyr::filter(str_detect(modname, "larvarea")) |>
  dplyr::mutate(Season = modname |> map(str_split, pattern = "_") |> map_chr(c(1,2))) %>%
  dplyr::mutate(Data = modname |> map(str_split, pattern = "_") |> map_chr(c(1,1))) %>%
  dplyr::mutate(Estimate = ifelse(Estimate == 0, NA, Estimate)) |>
  dplyr::left_join(stratlook2)

zoomax <- max(splitoutput$Estimate, na.rm=T)

zootsmean <- splitoutput %>%
  dplyr::group_by(modname, Region) %>%
  dplyr::mutate(fmean = mean(Estimate, na.rm=T)) 

3.5.1 Spring (January-June) large copepods

Spring indices for large copepods in the herring spring survey strata (her_sp), on Georges Bank (GB), and in the Gulf of Maine (GOM).

plot_zooindices(splitoutput = splitoutput |> dplyr::filter(Season == "spring"), 
             plotdata = "lgcopeALL", 
             plotregions = c("her_sp", "GB", "GOM"), #"AllEPU", "SS"
             plottitle = "Large copeopods") 

VAST-estimated density of large copepods

knitr::include_graphics(here::here("pyindex/lgcopeALL_spring_500_biascorrect_doy/ln_density-predicted.png"))

3.5.2 Fall (July-December) small copepods

Spring indices for small copepods in the herring fall survey strata (her_fa), on Georges Bank (GB), and in the Gulf of Maine (GOM).

plot_zooindices(splitoutput = splitoutput |> dplyr::filter(Season == "fall"), 
             plotdata = "smallcopeALL", 
             plotregions = c("her_fa", "GB", "GOM"), #"AllEPU", "SS"
             plottitle = "small copeopods") 

VAST-estimated density of small copepods

knitr::include_graphics(here::here("pyindex/smallcopeALL_fall_500_biascorrect/ln_density-predicted.png"))

3.5.3 September-February small copepods in herring larval area

Small copepods available to larvae

plot_zooindices(splitoutput = splitoutput2, 
             plotdata = "smallcopeALL", 
             plotregions = c("her_larv", "her_fa"), 
             plottitle = "Small copepods Sept-Feb") +
  ggplot2::theme(legend.position="none")

Comparison with small copepods outside the larval area

plot_zooindices_onepanel(splitoutput = splitoutput2 |> dplyr::filter(Region %in% c("her_larv", "no_larv")), 
             plotdata = "smallcopeALL", 
             plottitle = "Small copepods Sept-Feb") +
  ggplot2::ggtitle(NULL)

VAST-estimated density of small copepods September-February

knitr::include_graphics(here::here("pyindex/smallcopeALL_sepfeb_yrshift_500_larvarea_biascorrect/ln_density-predicted.png"))

4 Discussion

The indices suggest that spring large copepods incereased to a relatively high level in 2015 but have been declining since, although there is no long term trend in the time series.

Fall small copepods have generally increased from a low in the mid-2000s and have remained relatively stable 2010-present, and similarly have no long term trend in the time series.

September-February small copepods have very similar trajectories in the Atlantic herring bottom trawl survey strata and in the herring larval area defined using larval data. The trend for the bottom trawl survey strata shows a significant long term increase, while that in the herring larval area does not. Both time series have increased from a low in the mid-2000s.

There is some evidence that the proportion of small copepods outside the herring larval area was higher than within the larval area up until the mid-2010s. Since then, the estimated abundance of small copepods is similar inside and outside the herring larval area.

References

Collette, B.B., and Klein-Macphee, G. 2002. Bigelow and Schroeder’s Fishes of the Gulf of Maine, Third Edition. In 3rd ed. edition. Smithsonian Books, Washington, DC.
Gaichas, S.K., Gartland, J., Smith, B.E., Wood, A.D., Ng, E.L., Celestino, M., Drew, K., Tyrell, A.S., and Thorson, J.T. 2023. Assessing small pelagic fish trends in space and time using piscivore stomach contents. Canadian Journal of Fisheries and Aquatic Sciences: cjfas-2023-0093. doi:10.1139/cjfas-2023-0093.
Kane, J. 2007. Zooplankton abundance trends on Georges Bank, 1977–2004. ICES Journal of Marine Science 64(5): 909–919. doi:10.1093/icesjms/fsm066.
Morse, R.E., Friedland, K.D., Tommasi, D., Stock, C., and Nye, J. 2017. Distinct zooplankton regime shift patterns across ecoregions of the U.S. Northeast continental shelf Large Marine Ecosystem. Journal of Marine Systems 165: 77–91. doi:10.1016/j.jmarsys.2016.09.011.
Ng, E.L., Deroba, J.J., Essington, T.E., Grüss, A., Smith, B.E., and Thorson, J.T. 2021. Predator stomach contents can provide accurate indices of prey biomass. ICES Journal of Marine Science 78(3): 1146–1159. doi:10.1093/icesjms/fsab026.
Richardson, D.E., Hare, J.A., Overholtz, W.J., and Johnson, D.L. 2010. Development of long-term larval indices for Atlantic herring (Clupea harengus) on the northeast US continental shelf. ICES Journal of Marine Science 67(4): 617–627. doi:10.1093/icesjms/fsp276.
Thorson, J.T. 2019. Guidance for decisions using the Vector Autoregressive Spatio-Temporal (VAST) package in stock, ecosystem, habitat and climate assessments. Fisheries Research 210: 143–161. doi:10.1016/j.fishres.2018.10.013.
Thorson, J.T., and Barnett, L.A.K. 2017. Comparing estimates of abundance trends and distribution shifts using single- and multispecies models of fishes and biogenic habitat. ICES Journal of Marine Science 74(5): 1311–1321. doi:10.1093/icesjms/fsw193.
Thorson, J.T., and Kristensen, K. 2016. Implementing a generic method for bias correction in statistical models using random effects, with spatial and population dynamics examples. Fisheries Research 175: 66–74. doi:10.1016/j.fishres.2015.11.016.