Parametric methods : require specific parametric assumptions about the species abundance distribution.
Different parametric distributions may fit the observed data equally well, but lead to drastically different species richness estimates.
Non-parametric methods : always obtainable and make no assumptions about the mathematical form of the underlying species abundance/incidence distributions.
Species richness : number of observed species + number of undetected species
Uniques : number of species observed in only one sampling unit
Duplicates : number of species observed in exactly two sampling units
Super-duplicates : number of species observed in at least two sampling units
Non-parametric estimators based on the information about the infrequently detected (rare) species.
Jackknife resampling : technique especially useful to reduce the bias / variance of a biased estimator (similar to bootstrap). The basic idea with the jth
order jackknife method is to consider sub-data by successively deleting j
individuals from the data.
Jackknife estimator of a parameter :
n
,Estimation of species richness :
with :
N
: total number of individuals / species
S
: non-parametric maximum likelihood estimator of N
i
: sampling-unit (day, trap, quadrat)
t
: total number of sampling units
fr
: number of individuals captured exactly r times
Clearly, S is biased and this bias decreases as t increases. Hence the following correction :
with :
j
: individual / speciesFirst-order jackknife :
(depending on unique species)
Second-order jackknife :
(depending on unique and duplicate species)
with n = t
.
This method is especially useful when most of the captured individuals are caught once or twice in the sample (very small capture probabilities), for which case the kackknife estimator usually does not work well.
If there are many undetectable species, then it will be impossible to obtain a good estimate of species richness. A reliable lower bound for SR if often more practical use than an imprecise point estimate.
It is a minimum estimator of asymptotic species richness.
Chao1 estimator (number of undected species for abundance data) :
Chao2 estimator (number of undected species for incidence data) :
with :
S obs
: count of the total number of species observedT
: number of sampling unitsf1/Q1
: uniquesf2/Q2
: duplicatesIncidence-based coverage estimator (ICE) :
with :
with :
Qk
: number of species present in k
sampling unitsR
: utoff frequency between infrequent and frequent species in the reference sample (= 10 as a start)S infreq
: number of species occurring in fewer than R
sampling units (frequency counts for rare species)Y infreq
: summed incidence frequencies for species occurring in fewer than R
sampling unitsT infreq
: number of sampling units including at least one infrequent speciesC ICE
: estimate of sample coveragegamma ICE
: squared coefficient of variationDetection probabilities of unique species do not vary greatly among such species, so that most of the detection probabilities of uniques are concentrated at the average.
Rarefaction (interpolation) : traditional method to down-sample the larger samples, when comparing the sample richnesses of different assemblages, until they are the same size
(but loss of information)
Curve-fitting approaches : generally do not provide the variances of the resulting estimates
Extrapolation : use parametric curves to extrapolate a species-accumulation or species-area curve to predict its asymptote, which is used as an estimate of species richness, for hypothetical samples larger than the actual samples
(can be reliable to about twice the actual sample sizes)
Curve-fitting : fit a truncated parametric distribution or functional form to the observed species abundances to obtain an estimate of species richness
Sampling-theory-based approaches : two type of sampling data :
For sample-based incidence data :
interpolation : sample-based rarefaction (Bernoulli product model)
extrapolation : estimate of asymptotic species richness for a larger number of sampling units,
which requires an estimation of the number of undetected species :
Especially advantageous for social species (individuals difficult to distinguish and to count), and avoid the risk of double-counting individuals within sessions.
Usually require :
Extension of incidence-based sampling model of Colwell to encompass datasets with only :
STEPS :
S obs = S uniques + S super-duplicates
)SR estimated = S obs + S undetected