Copepod categories based on discussion with ToR 1 subgroup:
Calanus finmarchicus, a.k.a. Large copeopds SOE (used in small-large index)
Large copepods ALL: Calanus finmarchicus, Metridia lucens, Calanus minor, Eucalanus spp., Calanus spp.
Small copepods ALL: Centropages typicus, Pseudocalanus spp., Temora longicornis, Centropages hamatus, Paracalanus parvus, Acartia spp., Clausocalanus arcuicornis, Acartia longiremis, Clausocalanus furcatus, Temora stylifera, Temora spp., Tortanus discaudatus, Paracalanus spp.
Small copeopods SOE (used in small-large index): Centropages typicus, Pseudocalanus spp., Temora longicornis, Centropages hamatus
Lets do the same model selection as previously. Two stages, first looking at spatial, spatio-temporal random effects and the second looking at covariates. Selection is done for spring and fall models for:
Model selection script is https://github.com/NOAA-EDAB/zooplanktonindex/blob/main/VASTscripts/VASTunivariate_zoopindex_modselection.R
Models were run using REML and without bias correction.
Two different observation models were applied. The default VAST index
standardization (purpose = "index2"
in
make_settings
) uses a Gamma distribution for positive
catches and an alternative “Poisson-link delta-model” using log-link for
numbers-density and log-link for biomass per number
(ObsModel= c(2,1)
).
We applied the default observation model to Calanus finmarchicus (calfin_100m3) and Large copepods (calfin_100m3 + mlucens_100m3 + calminor_100m3 + euc_100m3 + calspp_100m3) datasets.
The default was used for index standardization of stomach contents data for pelagic and benthic forage indices. It is intended for continuous data, which includes biomass data and “numbers standardized to a fixed area” (see section starting at line 239 in the VAST user manual here. I am interpreting zooplankton abundance per 100 cubic meters as numbers standardized to a fixed area (volume) in applying the Gamma observation model.
For data where there are some years where the species is present in
all (or 0) samples, estimating the probability of encounter fails (or at
least, VAST won’t let you try). In these cases, the options are to treat
intercepts representing temporal variability as random effects (by
setting RhoConfig
Beta
or Epsilon
entries to something other than 0), or to use a different link
function.
We intend for our indices to potentially be used in assessment (though as a covariate rather than an index, so maybe I’m being too strict), so the recommendation is to “minimize covariance in the estimated index by excluding any temporal correlation on model components (i.e., the intercept is a fixed effect in each year, and the spatio-temporal term is independent in each year)” (Thorson 2019).
All of the small copepods datasets had at least one year where our small copepods groupings were encountered at all stations. None had years with 0 encounters.
Therefore, we used a different link function, the Poisson-link fixing
encounter probability=1 for any year where all samples encounter the
species. We kept all other settings for index standardization the same,
but set (ObsModel= c(2,4)
).
# from each output folder in pyindex,
outdir <- here::here("pyindex_modsel1")
moddirs <- list.dirs(outdir)
moddirs <- moddirs[-1]
# keep folder name
modnames <- list.dirs(outdir, full.names = FALSE)
# function to apply extracting info
getmodinfo <- function(d.name){
# read settings
modpath <- stringr::str_split(d.name, "/", simplify = TRUE)
modname <- modpath[length(modpath)]
settings <- read.table(file.path(d.name, "settings.txt"), comment.char = "",
fill = TRUE, header = FALSE)
n_x <- as.numeric(as.character(settings[(which(settings[,1]=="$n_x")+1),2]))
grid_size_km <- as.numeric(as.character(settings[(which(settings[,1]=="$grid_size_km")+1),2]))
max_cells <- as.numeric(as.character(settings[(which(settings[,1]=="$max_cells")+1),2]))
use_anisotropy <- as.character(settings[(which(settings[,1]=="$use_anisotropy")+1),2])
fine_scale <- as.character(settings[(which(settings[,1]=="$fine_scale")+1),2])
bias.correct <- as.character(settings[(which(settings[,1]=="$bias.correct")+1),2])
#FieldConfig
if(settings[(which(settings[,1]=="$FieldConfig")+1),1]=="Component_1"){
omega1 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+2),2])
omega2 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+3),1])
epsilon1 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+4),2])
epsilon2 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+5),1])
beta1 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+6),2])
beta2 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+7),1])
}
if(settings[(which(settings[,1]=="$FieldConfig")+1),1]=="Omega1"){
omega1 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+3),1])
omega2 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+4),1])
epsilon1 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+3),2])
epsilon2 <- as.character(settings[(which(settings[,1]=="$FieldConfig")+4),2])
beta1 <- "IID"
beta2 <- "IID"
}
#RhoConfig
rho_beta1 <- as.numeric(as.character(settings[(which(settings[,1]=="$RhoConfig")+3),1]))
rho_beta2 <- as.numeric(as.character(settings[(which(settings[,1]=="$RhoConfig")+3),2]))
rho_epsilon1 <- as.numeric(as.character(settings[(which(settings[,1]=="$RhoConfig")+4),1]))
rho_epsilon2 <- as.numeric(as.character(settings[(which(settings[,1]=="$RhoConfig")+4),2]))
# read parameter estimates, object is called parameter_Estimates
if(file.exists(file.path(d.name, "parameter_estimates.RData"))) {
load(file.path(d.name, "parameter_estimates.RData"))
AIC <- parameter_estimates$AIC[1]
converged <- parameter_estimates$Convergence_check[1]
fixedcoeff <- unname(parameter_estimates$number_of_coefficients[2])
randomcoeff <- unname(parameter_estimates$number_of_coefficients[3])
}else if(file.exists(file.path(d.name, "parameter_estimates.txt"))){
reptext <- readLines(file.path(d.name, "parameter_estimates.txt"))
AIC <- as.double(reptext[grep(reptext, pattern = "AIC")+1])
converged <- reptext[grep(reptext, pattern = "Convergence_check")+1]
fixedcoeff <- as.integer(stringr::str_split(reptext[grep(reptext, pattern = "number_of_coefficients")+2],
boundary("word"))[[1]][2])
randomcoeff <- as.integer(stringr::str_split(reptext[grep(reptext, pattern = "number_of_coefficients")+2],
boundary("word"))[[1]][3])
}else{
AIC <- NA_real_
converged <- NA_character_
fixedcoeff <- NA_integer_
randomcoeff <- NA_integer_
}
#index <- read.csv(file.path(d.name, "Index.csv"))
# return model attributes as a dataframe
out <- data.frame(modname = modname,
n_x = n_x,
grid_size_km = grid_size_km,
max_cells = max_cells,
use_anisotropy = use_anisotropy,
fine_scale = fine_scale,
bias.correct = bias.correct,
omega1 = omega1,
omega2 = omega2,
epsilon1 = epsilon1,
epsilon2 = epsilon2,
beta1 = beta1,
beta2 = beta2,
rho_epsilon1 = rho_epsilon1,
rho_epsilon2 = rho_epsilon2,
rho_beta1 = rho_beta1,
rho_beta2 = rho_beta2,
AIC = AIC,
converged = converged,
fixedcoeff = fixedcoeff,
randomcoeff = randomcoeff#,
#index = index
)
return(out)
}
modcompare <- purrr::map_dfr(moddirs, getmodinfo)
Models compared using REML are identified by model name (“modname” in Table Sb@ref(tab:modsel1)) which always includes all prey aggregated, season (“all” for annual models of months 1-12, “fall” for models of months 7-12, and “spring” for models of months 1-6), number of knots (500 for all models), and which fixed and random spatial and spatio-temporal effects were included in which linear predictor (1 or 2). The names for model options and associated VAST model settings are:
Model selection 1 (spatial, spatio-temporal effects, no covariates)
options (following (ng_predator_2021?))
and naming: * All models set Use_REML = TRUE in fit_model
specifications.
* Modeled effects, model name suffix, and VAST settings by model:
use_anisotopy = FALSE
use_anisotopy = FALSE
use_anisotopy = FALSE
use_anisotopy = FALSE
modselect <- modcompare |>
dplyr::mutate(season = dplyr::case_when(stringr::str_detect(modname, "_fall_") ~ "Fall",
stringr::str_detect(modname, "spring") ~ "Spring",
stringr::str_detect(modname, "_all_") ~ "Annual",
TRUE ~ as.character(NA))) |>
dplyr::mutate(converged2 = dplyr::case_when(stringr::str_detect(converged, "no evidence") ~ "likely",
stringr::str_detect(converged, "is likely not") ~ "unlikely",
TRUE ~ as.character(NA))) |>
dplyr::mutate(copegroup = stringr::str_extract(modname, "[^_]+")) |>
#dplyr::mutate(modname = str_extract(modname, '(?<=allagg_).*')) |>
dplyr::group_by(copegroup, season) |>
dplyr::mutate(deltaAIC = AIC-min(AIC)) |>
dplyr::select(copegroup, modname, season, deltaAIC, fixedcoeff,
randomcoeff, use_anisotropy,
omega1, omega2, epsilon1, epsilon2,
beta1, beta2, AIC, converged2) |>
dplyr::arrange(copegroup, season, AIC)
# DT::datatable(modselect, rownames = FALSE,
# options= list(pageLength = 25, scrollX = TRUE),
# caption = "Comparison of delta AIC values using Restricted Maxiumum Likelihood (REML) for alternative fixed and random effects model structures. See text for model descriptions.")
flextable::flextable(modselect |>
dplyr::select(-c(use_anisotropy,
omega1, omega2, epsilon1, epsilon2,
beta1, beta2))) |>
flextable::set_header_labels(copegroup = "Copepod group",
modname = "Model name",
season = "Season",
deltaAIC = "dAIC",
fixedcoeff = "N fixed",
randomcoeff = "N random",
converged2 = "Convergence") |>
flextable::set_caption("Comparison of delta AIC (dAIC) values using Restricted Maxiumum Likelihood (REML) for alternative fixed and random effects model structures, with number of fixed (N fixed) and random (N random) coefficients. See text for model descriptions associated with each model name.") |>
flextable::fontsize(size = 9, part = "all") |>
flextable::colformat_double(digits = 2) |>
flextable::set_table_properties(layout = "autofit", width = 1)
Copepod group | Model name | Season | dAIC | N fixed | N random | AIC | Convergence |
calfin | calfin_fall_500_alleffectson | Fall | 0.00 | 9 | 46,616 | 208,373.00 | likely |
calfin | calfin_fall_500_noaniso | Fall | 79.25 | 7 | 46,616 | 208,452.25 | likely |
calfin | calfin_fall_500_noomeps2 | Fall | 1,753.04 | 6 | 23,348 | 210,126.05 | likely |
calfin | calfin_fall_500_noomeps2_noaniso | Fall | 1,814.90 | 4 | 23,348 | 210,187.90 | likely |
calfin | calfin_fall_500_noomeps2_noeps1 | Fall | 3,474.74 | 5 | 634 | 211,847.75 | likely |
calfin | calfin_fall_500_noomeps2_noeps1_noaniso | Fall | 3,501.57 | 3 | 634 | 211,874.57 | likely |
calfin | calfin_fall_500_noomeps12 | Fall | 9,589.69 | 1 | 80 | 217,962.69 | likely |
calfin | calfin_fall_500_noomeps12_noaniso | Fall | 9,589.69 | 1 | 80 | 217,962.69 | likely |
calfin | calfin_spring_500_alleffectson | Spring | 0.00 | 9 | 46,534 | 239,281.74 | likely |
calfin | calfin_spring_500_noaniso | Spring | 48.35 | 7 | 46,534 | 239,330.09 | likely |
calfin | calfin_spring_500_noomeps2 | Spring | 864.36 | 6 | 23,308 | 240,146.10 | likely |
calfin | calfin_spring_500_noomeps2_noaniso | Spring | 889.01 | 4 | 23,308 | 240,170.75 | likely |
calfin | calfin_spring_500_noomeps2_noeps1 | Spring | 2,325.90 | 5 | 635 | 241,607.64 | likely |
calfin | calfin_spring_500_noomeps2_noeps1_noaniso | Spring | 2,339.84 | 3 | 635 | 241,621.59 | likely |
calfin | calfin_spring_500_noomeps12 | Spring | 6,517.46 | 1 | 82 | 245,799.20 | likely |
calfin | calfin_spring_500_noomeps12_noaniso | Spring | 6,517.46 | 1 | 82 | 245,799.20 | likely |
lgcopeALL | lgcopeALL_fall_500_alleffectson | Fall | 0.00 | 9 | 46,616 | 251,030.89 | likely |
lgcopeALL | lgcopeALL_fall_500_noaniso | Fall | 100.03 | 7 | 46,616 | 251,130.92 | likely |
lgcopeALL | lgcopeALL_fall_500_noomeps2 | Fall | 1,496.48 | 6 | 23,348 | 252,527.36 | likely |
lgcopeALL | lgcopeALL_fall_500_noomeps2_noaniso | Fall | 1,588.70 | 4 | 23,348 | 252,619.58 | likely |
lgcopeALL | lgcopeALL_fall_500_noomeps2_noeps1 | Fall | 3,572.64 | 5 | 634 | 254,603.53 | likely |
lgcopeALL | lgcopeALL_fall_500_noomeps2_noeps1_noaniso | Fall | 3,610.16 | 3 | 634 | 254,641.04 | likely |
lgcopeALL | lgcopeALL_fall_500_noomeps12 | Fall | 7,346.16 | 1 | 80 | 258,377.05 | likely |
lgcopeALL | lgcopeALL_fall_500_noomeps12_noaniso | Fall | 7,346.16 | 1 | 80 | 258,377.05 | likely |
lgcopeALL | lgcopeALL_spring_500_alleffectson | Spring | 0.00 | 9 | 46,450 | 257,499.63 | likely |
lgcopeALL | lgcopeALL_spring_500_noaniso | Spring | 78.55 | 7 | 46,450 | 257,578.17 | likely |
lgcopeALL | lgcopeALL_spring_500_noomeps2 | Spring | 1,746.74 | 6 | 23,266 | 259,246.36 | likely |
lgcopeALL | lgcopeALL_spring_500_noomeps2_noaniso | Spring | 1,798.07 | 4 | 23,266 | 259,297.70 | likely |
lgcopeALL | lgcopeALL_spring_500_noomeps2_noeps1 | Spring | 4,612.34 | 5 | 634 | 262,111.96 | likely |
lgcopeALL | lgcopeALL_spring_500_noomeps2_noeps1_noaniso | Spring | 4,622.66 | 3 | 634 | 262,122.28 | likely |
lgcopeALL | lgcopeALL_spring_500_noomeps12 | Spring | 7,906.90 | 1 | 82 | 265,406.53 | likely |
lgcopeALL | lgcopeALL_spring_500_noomeps12_noaniso | Spring | 7,906.90 | 1 | 82 | 265,406.53 | likely |
smallcopeALL | smallcopeALL_fall_500_alleffectson | Fall | 0.00 | 9 | 46,607 | 303,280.90 | likely |
smallcopeALL | smallcopeALL_fall_500_noaniso | Fall | 47.18 | 7 | 46,607 | 303,328.07 | likely |
smallcopeALL | smallcopeALL_fall_500_noomeps2 | Fall | 2,231.62 | 6 | 23,339 | 305,512.51 | likely |
smallcopeALL | smallcopeALL_fall_500_noomeps2_noaniso | Fall | 2,265.50 | 4 | 23,339 | 305,546.39 | likely |
smallcopeALL | smallcopeALL_fall_500_noomeps2_noeps1 | Fall | 4,501.12 | 5 | 625 | 307,782.01 | likely |
smallcopeALL | smallcopeALL_fall_500_noomeps2_noeps1_noaniso | Fall | 4,508.11 | 3 | 625 | 307,789.01 | likely |
smallcopeALL | smallcopeALL_fall_500_noomeps12 | Fall | 8,563.21 | 1 | 71 | 311,844.11 | likely |
smallcopeALL | smallcopeALL_fall_500_noomeps12_noaniso | Fall | 8,563.21 | 1 | 71 | 311,844.11 | likely |
smallcopeALL | smallcopeALL_spring_500_alleffectson | Spring | 0.00 | 9 | 46,441 | 282,327.03 | likely |
smallcopeALL | smallcopeALL_spring_500_noaniso | Spring | 56.67 | 7 | 46,441 | 282,383.71 | likely |
smallcopeALL | smallcopeALL_spring_500_noomeps2 | Spring | 3,961.69 | 6 | 23,257 | 286,288.72 | likely |
smallcopeALL | smallcopeALL_spring_500_noomeps2_noaniso | Spring | 4,004.70 | 4 | 23,257 | 286,331.73 | likely |
smallcopeALL | smallcopeALL_spring_500_noomeps2_noeps1_noaniso | Spring | 7,586.19 | 3 | 625 | 289,913.22 | likely |
smallcopeALL | smallcopeALL_spring_500_noomeps2_noeps1 | Spring | 7,586.74 | 5 | 625 | 289,913.78 | likely |
smallcopeALL | smallcopeALL_spring_500_noomeps12 | Spring | 13,390.23 | 1 | 73 | 295,717.26 | likely |
smallcopeALL | smallcopeALL_spring_500_noomeps12_noaniso | Spring | 13,390.23 | 1 | 73 | 295,717.26 | likely |
smallcopeSOE | smallcopeSOE_fall_500_alleffectson | Fall | 0.00 | 9 | 46,616 | 290,760.56 | likely |
smallcopeSOE | smallcopeSOE_fall_500_noaniso | Fall | 61.29 | 7 | 46,616 | 290,821.85 | likely |
smallcopeSOE | smallcopeSOE_fall_500_noomeps2 | Fall | 2,338.60 | 6 | 23,348 | 293,099.16 | likely |
smallcopeSOE | smallcopeSOE_fall_500_noomeps2_noaniso | Fall | 2,380.30 | 4 | 23,348 | 293,140.86 | likely |
smallcopeSOE | smallcopeSOE_fall_500_noomeps2_noeps1 | Fall | 4,708.61 | 5 | 634 | 295,469.17 | likely |
smallcopeSOE | smallcopeSOE_fall_500_noomeps2_noeps1_noaniso | Fall | 4,720.67 | 3 | 634 | 295,481.22 | likely |
smallcopeSOE | smallcopeSOE_fall_500_noomeps12 | Fall | 8,233.60 | 1 | 80 | 298,994.16 | likely |
smallcopeSOE | smallcopeSOE_fall_500_noomeps12_noaniso | Fall | 8,233.60 | 1 | 80 | 298,994.16 | likely |
smallcopeSOE | smallcopeSOE_spring_500_alleffectson | Spring | 0.00 | 9 | 46,445 | 278,928.98 | likely |
smallcopeSOE | smallcopeSOE_spring_500_noaniso | Spring | 72.09 | 7 | 46,445 | 279,001.07 | likely |
smallcopeSOE | smallcopeSOE_spring_500_noomeps2 | Spring | 4,100.24 | 6 | 23,261 | 283,029.22 | likely |
smallcopeSOE | smallcopeSOE_spring_500_noomeps2_noaniso | Spring | 4,157.09 | 4 | 23,261 | 283,086.06 | likely |
smallcopeSOE | smallcopeSOE_spring_500_noomeps2_noeps1_noaniso | Spring | 7,636.58 | 3 | 629 | 286,565.55 | likely |
smallcopeSOE | smallcopeSOE_spring_500_noomeps2_noeps1 | Spring | 7,638.62 | 5 | 629 | 286,567.60 | likely |
smallcopeSOE | smallcopeSOE_spring_500_noomeps12 | Spring | 12,907.54 | 1 | 77 | 291,836.51 | likely |
smallcopeSOE | smallcopeSOE_spring_500_noomeps12_noaniso | Spring | 12,907.54 | 1 | 77 | 291,836.51 | likely |
Using REML, models including spatial and spatio-temporal random effects as well as anisotropy were best supported by the data. This was true for the spring dataset and the fall dataset for calfin, lgcopeALL, smallcopeALL, and smallcopeSOE.
Full outputs from stage 1 model selection are posted to google drive here.
We haven’t done stage 2 yet; thoughts on what might be appropriate covariates (if any) are welcome.