Skill assessment framework for
 a length based multispecies model
ICES WGSAM ToR C Skill Assessment
Sarah Gaichas
 Northeast Fisheries Science Center
 Gavin Fay and Cristina Perez 
 University of Massachusetts Dartmouth 

 Many thanks to:
 (atlantisom development): Christine Stawitz, Kelli Johnson, Alexander Keth, Allan Hicks, Sean Lucey, Emma Hodgson, Gavin Fay 
 (Atlantis): Isaac Kaplan, Cecilie Hanson, Beth Fulton 
 (atlantisom use): Bai Li, Alphonso Perez Rodriguez, Howard Townsend 
 (skill assessment design): Patrick Lynch
1 / 27

Outline

Management decisions and models
Assessment model testing
- What do models need to do?
- Challenges with what models need to do
- Addressing challenges with Atlantis
Skill assessment with Atlantis "data"
1. R package atlantisom
2. Single species assessment
3. Multispecies assessment
4. Other models? Ensembles?

Discussion: what's next
- Automation of other model inputs!
- Atlantis output verification
- New outputs?
- Best practices for
  - dataset creation
  - input parameter estimation

poseidon

2 / 27

We do a lot of assessments

3 / 27

With a wide range of models
4 / 27

How do we know they are right?

Fits to historical data (hindcast)
Influence of data over time (retrospective diagnostics)
Keep as simple and focused as possible
Simulation testing

But, what if

data are noisy?

we need to model complex interactions?

conditions change over time?

https://xkcd.com/2323/

5 / 27

Skill assessment background (see Journal of Marine Systems Special Issue)

Stow et al. 2009 Stow et al 2009 Fig 1

Olsen et al. 2016 Olsen et al Fig 1

6 / 27

"Both our model predictions and the observations reside in a halo of uncertainty and the true state of the system is assumed to be unknown, but lie within the observational uncertainty (Fig. 1a). A model starts to have skill when the observational and predictive uncertainty halos overlap, in the ideal case the halos overlap completely (Fig. 1b). Thus, skill assessment requires a set of quantitative metrics and procedures for comparing model output with observational data in a manner appropriate to the particular application."

Skill assessment with ecological interactions... fit criteria alone are not sufficient

Ignore predation at your peril: results from multispecies state-space modeling Trijoulet et al. 2020

Ignoring trophic interactions that occur in marine ecosystems induces bias in stock assessment outputs and results in low model predictive ability with subsequently biased reference points.

VanessaPaper

EM1: multispecies state space

EM2: multispecies, no process error

EM3: single sp. state space, constant M

EM4: single sp. state space, age-varying M

note difference in scale of bias for single species!

modcomp

7 / 27

This is an important paper both because it demonstrates the importance of addressing strong species interactions, and it shows that measures of fit do not indicate good model predictive performance. Ignoring process error caused bias, but much smaller than ignoring species interactions. See also Vanessa's earlier paper evaluating diet data interactions with multispecies models

Virtual worlds with adequate complexity: end-to-end ecosystem models

Atlantis modeling framework: Fulton et al. 2011, Fulton and Smith 2004

Norwegian-Barents Sea

Hansen et al. 2016, 2018

NOBA scale 70%

California Current

Marshall et al. 2017, Kaplan et al. 2017

CCAspatial scale 80%

Building on global change projections: Hodgson et al. 2018, Olsen et al. 2018

8 / 27

Design: Ecosystem model scenario (climate and fishing)

9 / 27

Recruitment variability in the operating model
Specify uncertainty in assessment inputs using atlantisom

sardinerec scale 100%

Overview of `atlantisom` R package: link

Started at the 2015 Atlantis Summit atlantisom intro

atlantisom get started

10 / 27

atlantisom workflow: get "truth"locate files
run om_init
select species
run om_species
11 / 27

atlantisom workflow: get "data"specify surveys (can now have many, file for each)
area/time subsetting
efficiency (q) by species
selectivity by species
biological sample size
index cv
length at age cv
max size bin

specify survey diet sampling (new)requires detaileddietcheck.txt
diet sampling parameters

specify fisheryarea/time subsetting
biological sample size
catch cv

run om_index
run om_comps
run om_diet
environmental data functions too
12 / 27

`atlantisom` outputs, survey biomass index, link

Perfect information (one Season) NOBA fall survey 1

Survey with catchability and selectivity NOBA fall survey 2

13 / 27

`atlantisom` outputs, age and length compositions, link

capelin lengths halibut lengths

capelin ages halibut ages

14 / 27

`atlantisom` outputs, diet compositions, link

true diets

seasonal survey diets

15 / 27

Testing a simple "sardine" assessment, CC Atlantis Kaplan et al. 2021

Kaplan et al Fig 2

Will revisit with newer CC model; issues with different growth than assumed in SS setup?

workinprogress

16 / 27

Cod assessment based on NOBA Atlantis (Li, WIP)

https://github.com/Bai-Li-NOAA/poseidon-dev/blob/nobacod/NOBA_cod_files/README.MD

Conversion from SAM to SS successful

Fitting to NOBA data more problematic

17 / 27

Multispecies assessment based on NOBA Atlantis (Townsend et al, WIP)

Stepwise development process of self fitting, fitting to atlantis output, then skill assessment using atlantis output

Profiles for estimated parameters; but what to compare K values to?

Can test model diagnostic tools as well Using simulated data in mskeyrun package, available to all

18 / 27

ms-keyrun simulated data19 / 27

Initial results for Hydra: Example fits to biomass (left) and catch (right)

20 / 27

Initial results for Hydra: Example fit to biomass vs biomass skill

MODELS SHOWN ARE EXAMPLE TRIAL FITS, NOT FINISHED OR GOOD MODELS

21 / 27

Initial results for Hydra: Biomass Skill across models

MODELS SHOWN ARE EXAMPLE TRIAL FITS, NOT FINISHED OR GOOD MODELS

22 / 27

Initial results for Hydra: Biomass Skill across models

MODELS SHOWN ARE EXAMPLE TRIAL FITS, NOT FINISHED OR GOOD MODELS

23 / 27

Initial results for Hydra: example skill summary statistics, 5 bin models

MODELS SHOWN ARE EXAMPLE TRIAL FITS, NOT FINISHED OR GOOD MODELS

24 / 27

P.S. What else could we test?

xkcd_ensemble_model_2x

https://xkcd.com/1885/

25 / 27

Multispecies production model ensemble assessment--use Atlantis instead

Hydra OM setup

ensemble results

26 / 27

Difficulties so far

Atlantis related

Understanding Atlantis outputs (much improved with documentation since 2015)
Reconciling different Atlantis outputs, which to use?
Calculations correct? attempted M, per capita consumption

Skill assessment related

Running stock assessment models is difficult to automate. A lot of decisions are made by iterative running and diagnostic checks.
Generating input parameters for models that are consistent with Atlantis can be time consuming (Atlantis is a little too realistic...)
- Fit vonB length at age models to atlantisom output for input to length based model, not all converge (!)
- What is M (see above)

Slides available at https://noaa-edab.github.io/presentations

Contact: Sarah.Gaichas@noaa.gov

https://xkcd.com/2289/

27 / 27

atlantisom is using outputs not often used in other applications
- I don't run Atlantis so putting print statements in code not an option
- could be more efficient with targeted group work
- should we expect numbers in one output to match those in others?
- diet comp from detailed file matches diet comp in simpler output
- catch in numbers not always matching between standard and annual age outputs
- YOY output
- ... others that have been encountered
- estimating per capita consumption from detaileddiet.txt results in lower numbers than expected
- still can't get reasonable mortality estimates from outputs--understand this is an issue

Outline

Management decisions and models
Assessment model testing
- What do models need to do?
- Challenges with what models need to do
- Addressing challenges with Atlantis
Skill assessment with Atlantis "data"
1. R package atlantisom
2. Single species assessment
3. Multispecies assessment
4. Other models? Ensembles?

Discussion: what's next
- Automation of other model inputs!
- Atlantis output verification
- New outputs?
- Best practices for
  - dataset creation
  - input parameter estimation

poseidon

2 / 27

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Restart the presentation timer

?, h

Toggle this help

Skill assessment framework for
a length based multispecies model

ICES WGSAM ToR C Skill Assessment

Outline

We do a lot of assessments

With a wide range of models

How do we know they are right?

But, what if

data are noisy?

we need to model complex interactions?

conditions change over time?

Skill assessment background (see Journal of Marine Systems Special Issue)

Skill assessment with ecological interactions... fit criteria alone are not sufficient

Virtual worlds with adequate complexity: end-to-end ecosystem models

Design: Ecosystem model scenario (climate and fishing)

Overview of `atlantisom` R package: link

`atlantisom` workflow: get "truth"

`atlantisom` workflow: get "data"

`atlantisom` outputs, survey biomass index, link

`atlantisom` outputs, age and length compositions, link

`atlantisom` outputs, diet compositions, link

Testing a simple "sardine" assessment, CC Atlantis Kaplan et al. 2021

Cod assessment based on NOBA Atlantis (Li, WIP)

Multispecies assessment based on NOBA Atlantis (Townsend et al, WIP)

ms-keyrun simulated data

Initial results for Hydra: Example fits to biomass (left) and catch (right)

Initial results for Hydra: Example fit to biomass vs biomass skill

Initial results for Hydra: Biomass Skill across models

Initial results for Hydra: Biomass Skill across models

Initial results for Hydra: example skill summary statistics, 5 bin models

P.S. What else could we test?

Multispecies production model ensemble assessment--use Atlantis instead

Difficulties so far

Outline

Help

Skill assessment framework for a length based multispecies model

ICES WGSAM ToR C Skill Assessment

Outline

We do a lot of assessments

With a wide range of models

How do we know they are right?

But, what if

data are noisy?

we need to model complex interactions?

conditions change over time?

Skill assessment background (see Journal of Marine Systems Special Issue)

Skill assessment with ecological interactions... fit criteria alone are not sufficient

Virtual worlds with adequate complexity: end-to-end ecosystem models

Design: Ecosystem model scenario (climate and fishing)

Overview of atlantisom R package: link

atlantisom workflow: get "truth"

atlantisom workflow: get "data"

atlantisom outputs, survey biomass index, link

atlantisom outputs, age and length compositions, link

atlantisom outputs, diet compositions, link

Testing a simple "sardine" assessment, CC Atlantis Kaplan et al. 2021

Cod assessment based on NOBA Atlantis (Li, WIP)

Multispecies assessment based on NOBA Atlantis (Townsend et al, WIP)

ms-keyrun simulated data

Initial results for Hydra: Example fits to biomass (left) and catch (right)

Initial results for Hydra: Example fit to biomass vs biomass skill

Initial results for Hydra: Biomass Skill across models

Initial results for Hydra: Biomass Skill across models

Initial results for Hydra: example skill summary statistics, 5 bin models

P.S. What else could we test?

Multispecies production model ensemble assessment--use Atlantis instead

Difficulties so far

Outline

Help

Skill assessment framework for
a length based multispecies model

Overview of `atlantisom` R package: link

`atlantisom` workflow: get "truth"

`atlantisom` workflow: get "data"

`atlantisom` outputs, survey biomass index, link

`atlantisom` outputs, age and length compositions, link

`atlantisom` outputs, diet compositions, link