Is the United States Protecting its Amphibian Biodiversity?

This is a question I have thought about for some time now. Given the current amphibian biodiversity in the United States (that being the near 300 species of frogs, toads, salamanders, and newts) are we adequately protecting them? Recently I took the Google Data Analytics course through Coursera and decided to ask this question for the capstone!

amphibian biodiversity spadefoot toad scaphiopus couchii — A very small sample of amphibian biodiversity in the United States. Pictured left to right: Couch’s Spadefoot Toad (Scaphiopus couchii), Grey Tree Frog (Dryophytes versicolor), Western Slimy Salamander (Plethodon albagula).

amphibian biodiversity hyla versicolor grey treefrog dryophytes versicolor — A very small sample of amphibian biodiversity in the United States. Pictured left to right: Couch’s Spadefoot Toad (Scaphiopus couchii), Grey Tree Frog (Dryophytes versicolor), Western Slimy Salamander (Plethodon albagula).

Truthfully, I thought it would be a quick question to answer. In reality, this project took several months, hundreds of lines of code, and way too much time waiting on analyses to finish running, I finally have some results!

So let’s get into it and answer this question of ours!

How can we investigate this AMphibian biodiversity?

The question of “Is the United States protecting its Native Amphibian Biodiversity?” can actually be answered in several different ways. Are we looking at a historical overview of conservation legislation? Are we comparing the past and the present ranges of amphibians? What are we even considering as successful protection?

For this project, I specifically wanted to learn more about geospatial analysis and large dataset manipulation. As such, I opted to determine the amount of overlap between the ranges of native amphibian species and protected areas in the United States.

This is a natural starting point. After all, if amphibians are found in protected areas, that should be a good indicator that they have some protection. While any other question is totally valid, answering that first will allow us to focus our efforts.

Next, I ask the question I always do when I start a new project.

What data do I need?

“What data would I need in order to analyze the questions I have?”

By answering this question first, we can get a better understanding of how to conduct our project!

After some careful consideration, I realized I need data that shows where species are found and data that shows protected areas throughout the United States.

Occurence Data (Where are species found)

The first was relatively simple. For my Master’s thesis, I have been working with data from the Global Biodiversity Information Facility (GBIF)** and from the International Union for the Conservation of Nature (IUCN). To put it simply, GBIF data are point occurrence data from many data sets (professional as well as amateur), while IUCN data is carefully curated data in the form of shapefiles (think polygons on a map). Fortunately, both of these data on amphibian biodiversity are accessible to the public and super easy to acquire.

Graphical representation of Point Occurrence and Shapefile Data. Point Occurrence data is gathered from sources like GBIF, iNaturalist, or HerpMapper. Shapefile data is gathered from sources such as the IUCN, the USGS protected areas database, or guidebooks.

**I ultimately decided to exclude the GBIF data due to coding errors and not wanting to spend even more weeks finishing up this analysis.***

Further, a huge benefit of using IUCN data is that they designate which species are considered Endangered, Least Concern, Vulnerable, etc. This will let us categorize the data into groups of threatened versus non-threatened easily.

Examples of IUCN data. The left image shows the IUCN range map for the American Bullfrog (Lithobates catesbeianus). The data preview on the right shows some of the data that comes with the IUCN dataset.

Protected areas

To determine protected areas in the United States, I actually struck gold! The United States Geological Service has a Protected Areas Database (PAD)! This is a massive database of protected areas all over the United States that I can download directly from the USGS website. This data also has plenty of fields for us to work with!

[Figure of Protected Areas Database] [Figure of Pad_Sf]

Data preview for the Protected Areas Database from the USGS. The image left shows National Parks (in green). The database contains many more groups than National Parks. The image on the right shows a preview of the data as seen in R.

Additionally, they have a really nice data viewer! https://maps.usgs.gov/padus/

These two sources of data are perfect for us to work with. However, we can’t just use this data blindly.

What are problems I need to address with this data?

No matter where you get it from, all datasets have certain considerations we need to address before we use them.

IUCN Data for amphibian biodiversity

The IUCN data actually has very few issues that I need to directly tackle. The largest issue is filtering the data so that I only have ranges of native united states amphibians. Although looking at invasive species is interesting, that is beyond the scope of the study.

To solve this issue, I first filtered out all polygons that weren’t classified as “Extant (resident)” to remove those non-native ranges. Next, I created a map of the continental United States and only kept species that overlap with this map. This is a very quick way of accomplishing this and in the Methods deep dive, we’ll explore this choice more!

I also decided to exclude Alaska and Hawaii and focus this analysis only on the continental United States. There is no great reason other than me wanting to have less to analyze!

This process leaves us with 275 amphibian species, hopefully, a great representation of its biodiversity!

USGS data for Protected areas

The Protected Area Database is a beast thanks to the sheer quantity of data (over 260,000 polygons!). This is an incredible dataset from a data standpoint but is a huge issue from an analysis standpoint.

We will need to simplify this dataset. If we compared all 275 amphibian species to all 260,000+ protected area polygons, we would make ~73.5 million comparisons (275 x 267,000). Even if each comparison took 1/10 of a second, the analysis would need to run for 85 days straight!

How can we simplify the Pad data?

Naturally, we could filter out the polygons we do not need. However, with 260,000+ shapes, there’s only so much filtering you can do before you begin to oversimplify your dataset. Additionally, I have no idea what polygons are or are not important for our question at hand.

Given this, I opted to simplify the dataset by manipulating the shapefiles themselves. Instead of comparing each of the 260,000+ polygons, what if we could group related polygons together? For example, instead of having 103 polygons for National Parks, we can instead have just 1 polygon that represents all National Parks.

[Figure for this

Example of collapsing the polygons by group. The left image shows uncollapsed shapefiles for the National Forests in the Western US. Each color represents a unique shapefile (colors are duplicated). The image on the right is regarded as a single polygon due to it being collapsed.

This is actually a fairly straightforward process! I put together a simple function that enables you to split simple feature (sf) datasets into groups based on your column of choice, union them together, and then output a new sf with only the categories you want.

For example, if we grouped by Protected Area Types (National Park, WIlderness, Tribal Lands, etc.), we will end up having 45 polygons to compare instead of 260,000+. This will greatly reduce the amount of time needed to run our analyses. I can also run this grouping function over our IUCN data to simplify that as well.

How will we be analyzing our data?

For this project, I opted for a really simple area calculation. In essence, calculating the amount of overlapping area between amphibian biodiversity and protected areas.

To accomplish this, I developed another function that takes 2 sf objects in R and calculates the intersections between all the possible groups in the amphibian biodiversity data with all the possible groups in the data of the protected areas. The output is a table with the columns: Protected area, species, pad_area_m, sp_area_m,intersected_area_m. I later converted the results to square kilometers (divide square meter by 1,000,000) for everything reported here.

Example of how we calculated the intersection between all polygons of species ranges and protected areas. The output table notes the Species Group, the Protected Area group, the total area of the Species Group polygon, the total area of the Protected Areas polygon, and the area where they intersect.

What categories are we using?

Great! We have our game plan down. We are going to collapse both the amphibian biodiversity and the protected area datasets and then compare the amount of overlap between the two!

IUCN Categories

For the IUCN data, I collapsed the data by threat category, Family, and Order, however, only threat category is reported here. For the PAD, I condensed it by Management Type, Area Designation, and GAP. Management type is excluded from this report.

IUCN threat level

Least Concern (LC)
Extinct in Wild (EW)
Vulnerable (VU)
Near Threatened (NT)
Endangered (EN)
Critically Endangered (CR)
Data Deficient (DD)

Area Designation (Top 10 by land area):

Public Lands (PUB)
National Forest (NF)
Wilderness Area (WA)
state Resource Management Area (SRMA)
Inventoried Roadless area (IRA)
State Conservation Area (SCA)
Wilderness Study Area (WSA)
National Park (NP)
Area of Critical Environmental Concern (ACEC)
National Monument (NM)

Gap Analysis Project (GAP):

1 – managed for biodiversity – disturbance events proceed or are mimicked
2 – managed for biodiversity – disturbance events suppressed
3 – managed for multiple uses – subject to extractive (e.g. mining or logging) or OHV use
4 – no known mandate for biodiversity protection

All of these groupings were compared to one another to get our results! However, we should first talk about how to interpret these results.

How should we interpret our results?

In this analysis pipeline, we are collapsing polygons and dissolving internal boundaries. While this has the effect of simplifying our dataset, it also removes areas where polygons (i.e. multiple species ranges) overlap. As such, we need to be aware of what we are actually interpreting.

I think an example may help out here.

Let’s say we have 1000 square kilometers of where endangered species are found and this overlaps with 400 square kilometers of National Forest.

An incorrect interpretation

At first glance, you may interpret this data as “400 square km of Endangered species ranges are protected by National Forests. Thus, 40% of endangered species habitat is protected.”

While this may seem correct, it is unfortunately not. We only see the area in which endangered species exist. In this analysis, we have to completely ignore the number of species in a particular region, the summed area of these ranges, and the actual density of amphibian biodiversity.

Different amphibian range patterns result in the same interpretation. Scenario A shows a situation where there is only 1 endangered species. In this case, it is true, that 40% of their range is protected by National Forests. Scenario B shows a situation where there are many species and the majority of them exist outside the National Forest. Scenario C shows a situation where the majority of species overlap within the National Forest. All of these scenarios will give us the same results thanks to our data simplification. No matter how amphibians are distributed, 40% of the land where endangered species are found is protected. As a note, Scenario A is what both Scenario B and scenario C are converted to during our data cleaning stage.

An accruate interpretation

Let us contrast an incorrect interpretation with a correct one: “400 sq km of land where Endangered Species are found is located in National Forests. 40% of habitat with endangered species is protected by National Forests.”

This is a minor but important distinction. We are not analyzing amphibian species ranges, rather, the total physical area in which particular groups of species reside. This is best illustrated by the Least Concern species. Despite existing everywhere in the USA and possessing a combined total area larger than the United States itself, in our dataset their range is simply equal to the total area of the United States.

Fret not, however! This is actually still really useful!

Results

Finally some results! Now here I am going to abbreviate heavily. For now, I’ll summarize the top points and some interesting trends. If you want to see the full dataset, check out this google sheet here!

While writing these results I realized that the various percentages can be very easily confused. In order to make these results clearer, I will split the results into 2 sections. I will first report how much area is protected and then report the composition of those protected areas.

Demonstrating the two ways percentages will be reported. The first will determine the % amount of area that falls under protected areas. The second way will show the % breakdown of the protected areas themselves.

Additionally, I at times will refer to “Threatened Species”. Threatened species are those that have an IUCN category of Critically Endangered (CR), Endangered (EN), or Vulnerable (VU).

What percentage of areas where threatened species are found have protection?

To answer this first question, I used the GAP column to find the percent area where threatened species have some form of protection. I used GAP as the comparison because it only has 4 categories with no overlapping polygons.

75% of areas with Critically Endangered (CR) species are protected, 43% of areas with Endangered (EN) species are protected, and 44% of areas with Vulnerable (VU) are protected. The Areas with Least Concern species only have 29% of their range in Protected Areas. Again, this number is also analogous to the % area in the US with protected areas. These results begin to indicate something important.

Protected Areas tend to be where threatened species are found.

These are super awesome findings! See Figure 3 for an even further breakdown. However, the underlying reasons for this finding still need to be explained. For example, It could be that these protected areas are the last remnants of threatened species ranges indicating an overall decline of suitable habitat for these species. Alternatively, it could be that where threatened species are found, people tend to offer their habitats some form of protection. It could also very likely be some mixture of the two!

Sounds like an idea for a future community project? 😉

A comparison of Threatened Species and their GAP scores. Least Concern can be used as a proxy for determining the coverage of GAP scores across the United States.Not in PAD = % of area where species are found which are not protected. GAP 1 = managed for biodiversity – disturbance events proceed or are mimicked. GAP 2 – managed for biodiversity – disturbance events suppressed. GAP 3 – managed for multiple uses – subject to extractive (e.g. mining or logging) or OHV use. GAP 4 – no known mandate for biodiversity protection. Note that for GAP 4, these represent areas that are in the protected areas database, but have no known mandates for biodiversity protection. These include areas in the database that are classified one of the “Other” categories (e.g. FOTH), Local Parks (LP), or Local Recreation Areas (LRE). Values on hover over are in Square Kilometers.

Which protected areas contain threatened amphibian species?

Where Critically Endangered (CR), Endangered (EN), and Vulnerable (VU) species are found in protected areas, National Forests (CR = 61%; EN = 35%; VU = 41%), Wilderness (26%; 22%; 10%) and Public Lands (0%; 11%; 11%) comprise the greatest proportion. This actually occurs at a much higher rate than Least Concern species. This could be due to the fact that threatened species typically have reduced ranges and National Forests have some of the largest total spread across the US. Alternatively, it could indicate that the distribution of these species is more concentrated in these protected areas.

I was actually surprised that we don’t see more overlap with Public lands, which comprise 26% of the continental United States. Even more surprising was that Critically Endangered species have no overlap with Public Lands.

This pattern could be because areas with threatened species tend to afford more protection. thus are typically not considered Public Lands. Alternatively, public lands tend to be in more arid environments. While there are certainly still amphibians there, the suitable habitat for amphibians may be less than say National Forests. However, we still don’t know the “concentration” so to speak of amphibian ranges in a particular area. This is another example of where knowing how many Threatened species are actually in these areas would help elucidate this issue.

The issue of course could very well be the underlying data, problems with my code, or some other data-related snafu that I have not considered!

protected_area	CR	DD	EN	EW	LC	NT	VU
NF	60.61%	34.03%	34.66%	51.12%	26.15%	39.38%	40.62%
PUB	0.00%	3.74%	10.58%	1.41%	26.25%	3.95%	11.20%
WA	25.92%	16.77%	22.64%	0.00%	8.29%	10.94%	9.64%
SRMA	1.86%	4.37%	0.53%	0.00%	7.31%	8.01%	11.27%
IRA	0.00%	7.67%	7.16%	20.14%	6.55%	5.20%	6.12%
SCA	0.43%	4.93%	1.59%	3.14%	3.54%	8.94%	2.14%
WSA	0.00%	0.78%	0.09%	0.00%	3.11%	0.28%	1.31%
NP	0.00%	13.59%	6.45%	0.00%	2.45%	2.28%	0.88%
ACEC	0.00%	0.30%	2.80%	0.00%	2.49%	0.31%	0.78%
NM	0.00%	7.25%	3.56%	0.00%	1.85%	1.38%	1.73%
NWR	8.15%	0.00%	1.04%	0.00%	1.66%	1.71%	1.90%
REC	0.00%	0.98%	0.43%	0.00%	1.25%	3.56%	0.67%
NCA	1.64%	2.51%	0.74%	24.19%	1.44%	1.46%	0.65%
NRA	0.00%	1.11%	3.86%	0.00%	1.00%	1.25%	0.23%
SP	0.00%	0.10%	0.85%	0.00%	0.79%	1.50%	1.61%

The breakdown of where species are protected as percent of protected range. The columns correspond to IUCN categories CR = Critically Endangered; DD = Data Deficient; EN = Endangered; Ew= Extinct in Wild; LC = Least Concern; NT = Near Threatened; VU = Vulnerable. The rows are assigned to protected area designation NF = National Forest; PUB = Public lands; WA = Wilderness Area; SRMA = State Resrouce Managed Area; IRA = Inventoried Roadless Area; SCA = State Conservation Area; WSA = Wilderness Study Area; NP = National Park; ACEC = Area of Critical Conservation Concern; NM = National Monument; NWR = National Wildlife Refuge; REC = Recreation Management Area; NCA = National Conservation Area; NRA = National Recreation Area; SP = State Park

Next Steps and major considerations for Amphibian biodiversity

Over the next few blog posts, I plan to explore this dataset in more detail. Particularly looking at the methodology and results with a finer tooth comb, so make sure you subscribe if you want more info!

However, I do want to leave with a few notes and a few ideas for the next steps.

These results are in reality a first step. They dont leave me with an adequate answer, but they will be a great jumping off point!
We need to run this data set where it does not collapse the species ranges together. As I expressed in the article itself, this step would allow us to answer our question more thoroughly and in truth wouldnt take too terribly long with some further simplification!

What do you think of this project? Have any ideas for improvement? Feel free to let me know!

References:

U.S. Geological Survey (USGS) Gap Analysis Project (GAP), 2020, Protected Areas Database of the United States (PAD-US) 2.1: U.S. Geological Survey data release, https://doi.org/10.5066/P92QM3NT.