Entropy vs. diversity

  • Entropy is a combined measure of total number of species and the eveness of their distribution
  • Entropy can also be defined as the uncertainty of a particular sample’s species identity
    • Shannon entropy was originally used to quantify the uncertainty in predicting the next character in a text string. The more types of characters available (diversity), the higher the uncertainty (entropy) in predicting the correct one.
  • Entropies are reasonable indices of diversity, but do not match the intuitive understanding of diversity
  • Example: population A = 16 equally common species population B = 8 equally common species
    • Intuitively, population A is twice as diverse as population B
    • However, this is not true for entropies

      library(vegan)
      A <- rep(1/16, 16)
      B <- rep(1/8, 8)
      data.frame("Shannon A" = diversity(A, base = 2), "Shannon B" = diversity(B, base = 2)) #Base = 2 used to match up with Ref.1
      ##   Shannon.A Shannon.B
      ## 1         4         3
    • Note that population A’s Shannon entropy is not twice that of population B
  • Effective number of species (diversity)
    • All communities that share the same entropy value, have the same diversity
    • For every type of diversity (e.g. Shannon), every possible value entropy has a corresponding community where each species is equally common
    • The effective number of species is the number of species in this equivalent equally-common community
      • Thus, the effective number of species, \(D\), for a community can be found by setting its entropy value equal to the entropy equation applied to a population of \(D\) number of species with frequencies of \(1/D\) and solving for \(D\)
    • Interpretation for ENS/divesity - For any order \(q\) (see below), ENS/diversity represents the number of species in a equally common community that gives the same ENS/diversity value, at that order of \(q\)

      library(vegan)
      A <- c(1,43,40,5,1)
      exp(renyi(A))  #See Hill number's and Renyi entropy below for explanation
      ##        0     0.25      0.5        1        2        4        8       16
      ## 5.000000 3.914042 3.255867 2.648191 2.330265 2.222133 2.182638 2.158958
      ##       32       64      Inf
      ## 2.136983 2.117379 2.093023
      ## attr(,"class")
      ## [1] "renyi"   "numeric"
      • For order = 1 (exponential Shannon) - “at order = 1, population A has the same is as diverse as a community with 2.648191 equally common species”
      • For order = 0 (species richness/total species) - “at order = 0, population A has the same is as diverse as a community with 5 equally common species”
      • See Hill/Renyi section for updated explanation of order and updated updated interpretation when taken into account
  • Proof that diversity \(^qD = (\sum\limits_{i=1}^{S}p_i{}^q)^{(1/(1-q))}\) for any particular entorpy index
    • Variables
      • \(H()\) - any specific entropy
      • \(D\) - “diversity” or effective number of species, the number of equally common species
      • \(S\) - the total number of species in an actual sample
      • \(q\) - order
      • \(p_i\) - the frequency of a given species in a community
    • Given
      1. Entropy can be generalized as \(H(\Sigma_{i=1}^{S}(p_{i})^{q})\)
      2. \(H(\sum\limits_{i=1}^{D}(\frac{1}{D})^q) = x = H(\sum\limits_{i=1}^{S}(p_{i})^{q})\)
      3. \(H()\) is an invertible function (it is continuous and monotonic(is either only increasing or decreasing))
    • Solve for \(D\) in terms of \(x\)
      1. \(H(\sum\limits_{i=1}^{D}(\frac{1}{D})^q) = x\)
      2. \(H(D(\frac{1}{D})^q) = x\)
      3. \((\frac{1}{D})^{q-1} = H^{-1}(x)\)
        • \((1/D)^q = 1/(D^q) = D^{-q}\)
        • \(D(D^{-q}) = D^1D^{-q} = D^{1-q} = 1/D^{-(1-q)} = 1/(D^{q-1}) = (1/D)^{q-1}\)
      4. \(D = (\frac{1}{H^{-1}(x)})^\frac{1}{q-1}\)
    • Solve for \(D\) in terms of \(p_i\)
      1. \(D = (\frac{1}{H^{-1}(H(\Sigma(p_{i})^{q}))})^\frac{1}{q-1}\)
        • This subs in the left arm of (2) for \(x\) in (7)
      2. \(D = (\frac{1}{\Sigma(p_{i})^{q}})^\frac{1}{q-1}\)
        • By (3), applying a function to an inverted version of itself cancels out the function
      3. \(D = (\sum\limits_{i=1}^{S}p_i{}^q)^\frac{1}{1-q}\)
        • Algebra similar to that in (6)
  • Corollary - diversity depends only on species frequencies and order \(q\), not on the particular entropy function
Entropy index Equation To convert to diversity
Species richnes \(x = \Sigma_{i=1}^Sp_i{}^0\) \(x\)
Shannon entropy \(x = -\Sigma_{i=1}^Sp_i\textrm{ln}(p_i)\) \(e^x\)
Gini-Simpson index \(x = 1-\Sigma_{i=1}^Sp_i{}^2\) \(1/(1-x)\)
Renyi entropy \(x = (-\textrm{ln}\Sigma_{i=1}^S p_{i}{}^{q})/(q-1)\) \(e^x\)
  • Note on logarithms and conversion
    • Entropy depends on the log base, diversity does not
    • Equation for converting Shannon entropy to diversity is \(e^x\) because equation for obtaining Shannon diversity uses the natural logarithm (base = \(e\))
    • If a different log base was used to calculate entropy, that base would be used for the exponent
    • Similarly for any other diversity index
    library(vegan)
    A <- rep(1/16, 16)
    base.e <- c(diversity(A), exp(diversity(A)))
    base.2 <- c(diversity(A, base = 2),2^(diversity(A, base = 2)))
    data.frame(base.e, base.2, row.names = c("Shannon", "diversity")) #Entropy differs, diversity doesn't
    ##              base.e base.2
    ## Shannon    2.772589      4
    ## diversity 16.000000     16

Hill numbers, order, and Renyi entropy

  • \(^qD = (\sum\limits_{i=1}^{S}p_i{}^q)^{(1/(1-q))}\) are known as Hill numbers and give values of diversity at various orders of \(q\)
  • Order, \(q\), dictates the sensitivity of a diversity metric to common and rare species
    • \(q = 0\) is completely insensitive to species frequency (i.e. all species weighted equally)
      • Corresponds to the harmonic mean
    • \(q = 1\) weighs all species by their frequency
      • Corresponds to the geometric mean
      • The value of a Hill number is undefined at \(q = 1\) but is obtained from its limit
        • \((\sum\limits_{i=1}^{S}p_i{}^1)^{(1/(1-1))} = (\sum\limits_{i=1}^{S}p_i)^{Inf}\)
    • \(q > 2\) increasingly favors the more common species
      • \(q = 2\) specifically corresponds to the arithmetic mean
  • Extended Interpretation for ENS/divesity - ENS/diversity represents the number of species in a equally common community that gives the same ENS/diversity value as the region of the population focused on by the order of \(q\)

    library(vegan)
    A <- c(1,43,40,5,1)
    exp(renyi(A)) 
    ##        0     0.25      0.5        1        2        4        8       16
    ## 5.000000 3.914042 3.255867 2.648191 2.330265 2.222133 2.182638 2.158958
    ##       32       64      Inf
    ## 2.136983 2.117379 2.093023
    ## attr(,"class")
    ## [1] "renyi"   "numeric"
    • For order = 1 - “Altogether, population A has the same is as diverse as a community with 2.648191 equally common species”
    • For order = \(Inf\) - “When considering only the most abundant species, population A has the same is as diverse as a community with 2.093023 equally common species”
      • Since, starting after 1, increasing values of \(q\) focus more and more on the most abundant species, \(Inf\) is the value of \(q\) that is most focused on the abundant species
      • Population A is an extreme example of a population with abundant species (i.e. very uneven), so even diversity values at \(q = 2\) are close to those at \(q = Inf\) and could be said to approximate the diversity of high abundance species
      • However, in a population where there is less dominance by few abundant species (i.e. more even), only extreme \(q\) values like \(Inf\) will explain the diversity of only the high abundance species
    • The insensitivity of \(q=1\) to eveness is one of the reasons exponential Shannon diversity is one of the best/most interpretable single measures of diversity.
  • Hill numbers satisfy the “doubling rule”

    library(vegan)
    A <- sample(1:100, 10, replace = T)
    A <- A/sum(A)
    B <- c(A/2, A/2) #Population with same number of individuals, but twice as many species
    Hill.A <- exp(renyi(A))
    Hill.B <- exp(renyi(B))
    data.frame(Hill.A, Hill.B, "Ratio Hill B:A" = Hill.B/Hill.A)
    ##         Hill.A    Hill.B Ratio.Hill.B.A
    ## 0    10.000000 20.000000              2
    ## 0.25  8.894739 17.789478              2
    ## 0.5   8.083409 16.166818              2
    ## 1     7.048758 14.097516              2
    ## 2     6.086147 12.172293              2
    ## 4     5.404975 10.809949              2
    ## 8     5.017098 10.034196              2
    ## 16    4.825909  9.651818              2
    ## 32    4.710580  9.421161              2
    ## 64    4.624034  9.248068              2
    ## Inf   4.516484  9.032967              2
  • Renyi entropy \(=\frac{1}{1-q} \textrm{ln}\sum\limits_{i=1}^S p_{i}{}^{q}\)

*Special cases of order values

Order Hill number equivalent Renyi entropy equivalent
0 Species richness /
1 / Shannon entropy
2 Inverse Simpson entropy* /
Inf Berger-Parker index /

*\(\textrm{Inverse Simpson entropy} = 1/(1-\textrm{Gini-Simpson index})\) ### References 1. “Entropy and diversity” Lou Jost, OIKOS 20016