Diversity analysis
Entropy vs. diversity
- Entropy is a combined measure of total number of species and the eveness of their distribution
- Entropy can also be defined as the uncertainty of a particular sample’s species identity
- Shannon entropy was originally used to quantify the uncertainty in predicting the next character in a text string. The more types of characters available (diversity), the higher the uncertainty (entropy) in predicting the correct one.
- Entropies are reasonable indices of diversity, but do not match the intuitive understanding of diversity
- Example: population A = 16 equally common species population B = 8 equally common species
- Intuitively, population A is twice as diverse as population B
However, this is not true for entropies
library(vegan) A <- rep(1/16, 16) B <- rep(1/8, 8) data.frame("Shannon A" = diversity(A, base = 2), "Shannon B" = diversity(B, base = 2)) #Base = 2 used to match up with Ref.1
## Shannon.A Shannon.B ## 1 4 3
- Note that population A’s Shannon entropy is not twice that of population B
- Effective number of species (diversity)
- All communities that share the same entropy value, have the same diversity
- For every type of diversity (e.g. Shannon), every possible value entropy has a corresponding community where each species is equally common
- The effective number of species is the number of species in this equivalent equally-common community
- Thus, the effective number of species, \(D\), for a community can be found by setting its entropy value equal to the entropy equation applied to a population of \(D\) number of species with frequencies of \(1/D\) and solving for \(D\)
Interpretation for ENS/divesity - For any order \(q\) (see below), ENS/diversity represents the number of species in a equally common community that gives the same ENS/diversity value, at that order of \(q\)
library(vegan) A <- c(1,43,40,5,1) exp(renyi(A)) #See Hill number's and Renyi entropy below for explanation
## 0 0.25 0.5 1 2 4 8 16 ## 5.000000 3.914042 3.255867 2.648191 2.330265 2.222133 2.182638 2.158958 ## 32 64 Inf ## 2.136983 2.117379 2.093023 ## attr(,"class") ## [1] "renyi" "numeric"
- For order = 1 (exponential Shannon) - “at order = 1, population A has the same is as diverse as a community with 2.648191 equally common species”
- For order = 0 (species richness/total species) - “at order = 0, population A has the same is as diverse as a community with 5 equally common species”
- See Hill/Renyi section for updated explanation of order and updated updated interpretation when taken into account
- Proof that diversity \(^qD = (\sum\limits_{i=1}^{S}p_i{}^q)^{(1/(1-q))}\) for any particular entorpy index
- Variables
- \(H()\) - any specific entropy
- \(D\) - “diversity” or effective number of species, the number of equally common species
- \(S\) - the total number of species in an actual sample
- \(q\) - order
- \(p_i\) - the frequency of a given species in a community
- Given
- Entropy can be generalized as \(H(\Sigma_{i=1}^{S}(p_{i})^{q})\)
- \(H(\sum\limits_{i=1}^{D}(\frac{1}{D})^q) = x = H(\sum\limits_{i=1}^{S}(p_{i})^{q})\)
- \(H()\) is an invertible function (it is continuous and monotonic(is either only increasing or decreasing))
- Solve for \(D\) in terms of \(x\)
- \(H(\sum\limits_{i=1}^{D}(\frac{1}{D})^q) = x\)
- \(H(D(\frac{1}{D})^q) = x\)
- \((\frac{1}{D})^{q-1} = H^{-1}(x)\)
- \((1/D)^q = 1/(D^q) = D^{-q}\)
- \(D(D^{-q}) = D^1D^{-q} = D^{1-q} = 1/D^{-(1-q)} = 1/(D^{q-1}) = (1/D)^{q-1}\)
- \(D = (\frac{1}{H^{-1}(x)})^\frac{1}{q-1}\)
- Solve for \(D\) in terms of \(p_i\)
- \(D = (\frac{1}{H^{-1}(H(\Sigma(p_{i})^{q}))})^\frac{1}{q-1}\)
- This subs in the left arm of (2) for \(x\) in (7)
- \(D = (\frac{1}{\Sigma(p_{i})^{q}})^\frac{1}{q-1}\)
- By (3), applying a function to an inverted version of itself cancels out the function
- \(D = (\sum\limits_{i=1}^{S}p_i{}^q)^\frac{1}{1-q}\)
- Algebra similar to that in (6)
- \(D = (\frac{1}{H^{-1}(H(\Sigma(p_{i})^{q}))})^\frac{1}{q-1}\)
- Variables
- Corollary - diversity depends only on species frequencies and order \(q\), not on the particular entropy function
Entropy index | Equation | To convert to diversity |
---|---|---|
Species richnes | \(x = \Sigma_{i=1}^Sp_i{}^0\) | \(x\) |
Shannon entropy | \(x = -\Sigma_{i=1}^Sp_i\textrm{ln}(p_i)\) | \(e^x\) |
Gini-Simpson index | \(x = 1-\Sigma_{i=1}^Sp_i{}^2\) | \(1/(1-x)\) |
Renyi entropy | \(x = (-\textrm{ln}\Sigma_{i=1}^S p_{i}{}^{q})/(q-1)\) | \(e^x\) |
- Note on logarithms and conversion
- Entropy depends on the log base, diversity does not
- Equation for converting Shannon entropy to diversity is \(e^x\) because equation for obtaining Shannon diversity uses the natural logarithm (base = \(e\))
- If a different log base was used to calculate entropy, that base would be used for the exponent
- Similarly for any other diversity index
library(vegan) A <- rep(1/16, 16) base.e <- c(diversity(A), exp(diversity(A))) base.2 <- c(diversity(A, base = 2),2^(diversity(A, base = 2))) data.frame(base.e, base.2, row.names = c("Shannon", "diversity")) #Entropy differs, diversity doesn't
## base.e base.2 ## Shannon 2.772589 4 ## diversity 16.000000 16
Hill numbers, order, and Renyi entropy
- \(^qD = (\sum\limits_{i=1}^{S}p_i{}^q)^{(1/(1-q))}\) are known as Hill numbers and give values of diversity at various orders of \(q\)
- Order, \(q\), dictates the sensitivity of a diversity metric to common and rare species
- \(q = 0\) is completely insensitive to species frequency (i.e. all species weighted equally)
- Corresponds to the harmonic mean
- \(q = 1\) weighs all species by their frequency
- Corresponds to the geometric mean
- The value of a Hill number is undefined at \(q = 1\) but is obtained from its limit
- \((\sum\limits_{i=1}^{S}p_i{}^1)^{(1/(1-1))} = (\sum\limits_{i=1}^{S}p_i)^{Inf}\)
- \(q > 2\) increasingly favors the more common species
- \(q = 2\) specifically corresponds to the arithmetic mean
- \(q = 0\) is completely insensitive to species frequency (i.e. all species weighted equally)
Extended Interpretation for ENS/divesity - ENS/diversity represents the number of species in a equally common community that gives the same ENS/diversity value as the region of the population focused on by the order of \(q\)
library(vegan) A <- c(1,43,40,5,1) exp(renyi(A))
## 0 0.25 0.5 1 2 4 8 16 ## 5.000000 3.914042 3.255867 2.648191 2.330265 2.222133 2.182638 2.158958 ## 32 64 Inf ## 2.136983 2.117379 2.093023 ## attr(,"class") ## [1] "renyi" "numeric"
- For order = 1 - “Altogether, population A has the same is as diverse as a community with 2.648191 equally common species”
- For order = \(Inf\) - “When considering only the most abundant species, population A has the same is as diverse as a community with 2.093023 equally common species”
- Since, starting after 1, increasing values of \(q\) focus more and more on the most abundant species, \(Inf\) is the value of \(q\) that is most focused on the abundant species
- Population A is an extreme example of a population with abundant species (i.e. very uneven), so even diversity values at \(q = 2\) are close to those at \(q = Inf\) and could be said to approximate the diversity of high abundance species
- However, in a population where there is less dominance by few abundant species (i.e. more even), only extreme \(q\) values like \(Inf\) will explain the diversity of only the high abundance species
- The insensitivity of \(q=1\) to eveness is one of the reasons exponential Shannon diversity is one of the best/most interpretable single measures of diversity.
Hill numbers satisfy the “doubling rule”
library(vegan) A <- sample(1:100, 10, replace = T) A <- A/sum(A) B <- c(A/2, A/2) #Population with same number of individuals, but twice as many species Hill.A <- exp(renyi(A)) Hill.B <- exp(renyi(B)) data.frame(Hill.A, Hill.B, "Ratio Hill B:A" = Hill.B/Hill.A)
## Hill.A Hill.B Ratio.Hill.B.A ## 0 10.000000 20.000000 2 ## 0.25 8.894739 17.789478 2 ## 0.5 8.083409 16.166818 2 ## 1 7.048758 14.097516 2 ## 2 6.086147 12.172293 2 ## 4 5.404975 10.809949 2 ## 8 5.017098 10.034196 2 ## 16 4.825909 9.651818 2 ## 32 4.710580 9.421161 2 ## 64 4.624034 9.248068 2 ## Inf 4.516484 9.032967 2
Renyi entropy \(=\frac{1}{1-q} \textrm{ln}\sum\limits_{i=1}^S p_{i}{}^{q}\)
*Special cases of order values
Order | Hill number equivalent | Renyi entropy equivalent |
---|---|---|
0 | Species richness | / |
1 | / | Shannon entropy |
2 | Inverse Simpson entropy* | / |
Inf | Berger-Parker index | / |
*\(\textrm{Inverse Simpson entropy} = 1/(1-\textrm{Gini-Simpson index})\) ### References 1. “Entropy and diversity” Lou Jost, OIKOS 20016