After pulling the data (length sample data and the landings data) for a particular species a user can now use the aggregate_landings()` function to aggregate the landings and the length data. The user has several choices to make that determine how the data should be processed. All decision points are written to a logfile for the user to inspect. Many of the steps below are dictated by user inputs in the form of function arguments.
Gear is defined using the field NEGEAR which characterizes gear by a
three character code. A list of gear types with their codes and
descriptions can be found using comlandr::get_gears()
For example, using a value of 95%
(landingsThresholdGear = .95
) would select the distinct
gear types, when ordered by landings, for which the cumulative sum >
95% of total landings. All other gear types (those comprising < 5% of
total landings) would be combined into an otherGear
category.
In the table, the gear types (050,010,100) would be retained. All other gear types would be combined together.
NEGEAR | Landings | Cumulative | Proportion |
---|---|---|---|
050 | 1141407667 | 1141407667 | 0.8285119 |
010 | 95196644 | 1236604311 | 0.8976121 |
100 | 93069581 | 1329673892 | 0.9651684 |
020 | 30155056 | 1359828948 | 0.9870570 |
056 | 9405172 | 1369234120 | 0.9938839 |
132 | 1773276 | 1371007396 | 0.9951711 |
057 | 1630702 | 1372638098 | 0.9963548 |
160 | 1462198 | 1374100296 | 0.9974161 |
pValue=0.05
) to determine
if they are significantly different. Any NEGEAR codes that are found to
be statistically insignificant are aggregated.If a species_object is used the steps above are skipped since all gear aggregations are predetermined by the user
Market categories are defined by the field MARKET_CODE which
characterizes market codes by a two character code. A list of market
codes for a species can be found using comlandr::get_species_itis()
Any MARKET_CODEs that do not have any length samples (when aggregated over YEAR, QTR) are treated first. The user has the option to aggregate these market codes with any other market code. The default option is to relabel these as unknowns, “UN”. Often these MARKET_CODEs have a small amount of landings attributed to them and the impact of this aggregation is negligible
The length distributions for the remaining MARKET_CODEs
(aggregated over YEAR, QTR) are then tested against each other (using
the Kolmogorov-Smirnov test) at a predetermined significance level
(pValue=0.05
) to determine if they are significantly
different. Any MARKET_CODEs that are found to be statistically
insignificant are aggregated.
If a species_object is used the steps above are skipped since all market code aggregations are predetermined by the user
The user has the option of stopping at this point
(borrowLengths=F
). The data returned will either be at the
QTR or YEAR level depending on user inputs.
The user supplies the level in which landings should be aggregated, (YEAR, QTR, or SEMESTER). All combinations of Time, NEGEAR, MARKET_CODE that do not have associated length samples borrow samples from the nearest neighbor in time. Future development: Include nearest neighbor based on spatial units and/or GEAR
Before the borrowing of length samples commences, any gear types found to not have any length samples are aggregated with the otherGear category (This does not occur if species_object is used)
The borrowing of length samples from one time interval to use in another time interval can be a subjective process that differs among species based on life history traits. To complicate matters length distributions sampled within MARKET_CODEs may have shifted over time. These issues will be dealt with at a future date. A generalized method is currently applied (Future development: Include additional options based on life history traits and fishing industry changes)
1 The method of borrowing length samples is complex and the user has several options. These methods are described below.
The following are performed sequentially:
The landings and length sample data are aggregated over QTR to the the annual (YEAR) level
For each gear type (NEGEAR) the Kolmogorov-Smirnov test is applied in each YEAR (Future work) to test for differences in the length distributions among MARKET_CODEs. Any MARKET_CODEs with length distributions found to be statistically insignificant can be aggregated by the user. The default is no aggregation of MARKET_CODEs.
For each gear type (NEGEAR) and market code: A YEAR with missing length samples borrows length samples from the YEAR closest to it in time (in the future or the past)
For any NEGEAR/MARKET_CODE combinations where the number of
length samples is less than the user defined value
(nLengthSamples
), for all YEARS, length samples are
borrowed from another NEGEAR type with the same MARKET_CODE. Samples are
borrowed from the NEGEAR with the most landings (since it is assumed
they will have the most length samples). If there are insufficient
length samples, the next NEGEAR in order of total landings is checked.
And so on.
The sampling program for most species started several years after landings were first recorded. So there is a stretch of consecutive years in the early part of the time series without any length samples. These years are assigned length samples from the first year in which samples were taken.
For YEARs where there are landings before any length samples were taken all YEAR/QTRs are assigned the length samples from the first YEAR in which length samples were taken (from the same QTR)
For YEARs when QTRs have missing length samples the previous years length samples for the same QTR are assigned. If there are not length samples in the previous year then the process is repeated until a length sample is available. If this still results in no length samples then the nearest neighbor is used. Length samples from the nearest QTR (any QTR) are used. If this still results in no length samples then the nearest neighbor is used from the NEGEAR type that is attributed with the majority of the landings.
All decisions made regarding where length samples are borrowed from are written to the logfile for later inspecition
In all methods above the otherGear
category is by
definition sparse. This gear type is always aggregated to annual
data.
In all methods above the landings and length sample data for the
unclassified market category, UN
, are aggregated to the
same level as other MARKET_CODEs.
The unclassified landings for a specific (YEAR, QTR, NEGEAR) are assumed to be a mixture of fish lengths similar to that in the observed MARKET_CODEs. Therefore the length distribution for the unclassified landings are assumed to have the same distribution as the length distributions of the landed fish in all of the known market codes combined.
For each YEAR, QTR, NEGEARs where there are no length samples available then all length samples over all MARKET_CODES are assigned to the Unclassifieds.
For cases where there are no samples for these other MARKET_CODEs then the nearest neighbor in time is used.