The United States Department of Agriculture PLANTS database provides general information about plant species across the country. Given 3 states, I wanted to visualize which plant families are present in each and which state(s) hold the most species in each family. To accomplish this task, I used Python’s pandas, matplotlib, and seaborn libraries for analysis.
Initial Setup
Before beginning, I import pandas, matplotlib, and seaborn.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
Gathering Data
I pulled data sets from the USDA website for New York, Idaho, and California. The default encoding is in Latin-1 for exported text files. When importing into pandas, the encoding must be specified to work properly.
ny_list = pd.read_csv('ny_list.txt', encoding='latin-1')
ca_list = pd.read_csv('ca_list.txt', encoding='latin-1')
id_list = pd.read_csv('id_list.txt', encoding='latin-1')
I double check the files have been loaded correctly into dataframe format using head().
Cleaning Data
The major point of interest in the imported dataframe is the ‘Family’ column. I create a new dataframe organized by this column and returning the count from each row.
ny_fam = ny_list.groupby('Family').count()
Next, I remove the unwanted columns. I’ve chosen only to keep the ‘Symbol’ column as a representation of count because this variable is required for every plant instance.
ny_fam_1 = ny_fam.drop(['Synonym Symbol', 'Scientific Name with Author', 'National Common Name'], axis=1)
Then, I change the column name from ‘Symbol’ to ‘{State} Count’ to lend itself for merging the dataframes without confusion.
ny_fam_1 = ny_fam_1.rename(columns = {"Symbol":"NY Count"})
I complete the same process for the California and Idaho data.
ca_fam = ca_list.groupby('Family').count()
ca_fam_1 = ca_fam.drop(['Synonym Symbol', 'Scientific Name with Author', 'National Common Name'], axis=1)
ca_fam_1 = ca_fam_1.rename(columns = {"Symbol":"CA Count"})
id_fam = id_list.groupby('Family').count()
id_fam_1 = id_fam.drop(['Synonym Symbol', 'Scientific Name with Author', 'National Common Name'], axis=1)
id_fam_1 = id_fam_1.rename(columns = {"Symbol":"ID Count"})
Reset the index to prepare the data frames for outer merges based on column names. The index is set to ‘Family’ as default, from the initial data frame creation using the count() function. To discourage any unwanted changes, I create a copy of each data frame as I go.
ny_fam_2 = ny_fam_1.copy()
ny_fam_2 = ny_fam_2.reset_index()
ca_fam_2 = ca_fam_1.copy()
ca_fam_2 = ca_fam_2.reset_index()
id_fam_2 = id_fam_1.copy()
id_fam_2 = id_fam_2.reset_index()
Merging Data
To preserve all the plant species regardless of presence in each individual state, I perform outer merges. This will allow for areas without data to be filled with zeros after the family counts are combined.
combo1 = pd.merge(ny_fam_2, ca_fam_2, how='outer')
combo2 = pd.merge(combo1, id_fam_2, how='outer')
pd.options.display.float_format = '{:,.0f}'.format
combo2 = combo2.fillna(0)
Creating a New Column
I added a column to aid in visualizations. I created a function to return the state with the highest presence of each plant family based on the existing columns.
def presence(row):
if row['NY Count'] > row['CA Count'] and row['NY Count'] > row['ID Count']:
return 'NY'
elif row['CA Count'] > row['NY Count'] and row['CA Count'] > row['ID Count']:
return 'CA'
elif row['ID Count'] > row['NY Count'] and row['ID Count'] > row['CA Count']:
return 'ID'
elif row['NY Count'] == row['CA Count'] and row['NY Count'] > row['ID Count']:
return 'CA/NY'
elif row['CA Count'] == row['ID Count'] and row['CA Count'] > row['NY Count']:
return 'CA/ID'
elif row['ID Count'] == row['NY Count'] and row['ID Count'] > row['CA Count']:
return 'ID/NY'
else:
return 'Same'
combo2['Highest Presence'] = combo2.apply(presence, axis=1)
Below is the full table of all plant families in the dataframe.
Family | NY Count | CA Count | ID Count | Highest Presence | |
---|---|---|---|---|---|
0 | Acanthaceae | 7 | 7 | 0 | CA/NY |
1 | Acarosporaceae | 1 | 1 | 18 | ID |
2 | Aceraceae | 46 | 29 | 24 | NY |
3 | Acoraceae | 5 | 2 | 4 | NY |
4 | Actinidiaceae | 3 | 0 | 0 | NY |
5 | Adoxaceae | 2 | 0 | 0 | NY |
6 | Agavaceae | 4 | 48 | 0 | CA |
7 | Aizoaceae | 5 | 42 | 0 | CA |
8 | Alismataceae | 64 | 53 | 33 | NY |
9 | Amaranthaceae | 78 | 82 | 38 | CA |
10 | Amblystegiaceae | 64 | 0 | 0 | NY |
11 | Anacardiaceae | 46 | 47 | 22 | CA |
12 | Andreaeaceae | 3 | 0 | 0 | NY |
13 | Annonaceae | 3 | 0 | 0 | NY |
14 | Anomodontaceae | 8 | 0 | 0 | NY |
15 | Apiaceae | 190 | 372 | 257 | CA |
16 | Apocynaceae | 48 | 60 | 42 | CA |
17 | Aquifoliaceae | 21 | 3 | 0 | NY |
18 | Araceae | 26 | 18 | 3 | NY |
19 | Araliaceae | 26 | 6 | 8 | NY |
20 | Aristolochiaceae | 28 | 9 | 3 | NY |
21 | Asclepiadaceae | 50 | 67 | 20 | CA |
22 | Aspleniaceae | 23 | 7 | 6 | NY |
23 | Asteraceae | 2,057 | 3,858 | 2,260 | CA |
24 | Aulacomniaceae | 7 | 0 | 0 | NY |
25 | Azollaceae | 4 | 4 | 0 | CA/NY |
26 | Bacidiaceae | 1 | 1 | 0 | CA/NY |
27 | Balsaminaceae | 10 | 6 | 11 | ID |
28 | Bartramiaceae | 11 | 0 | 0 | NY |
29 | Berberidaceae | 21 | 59 | 25 | CA |
30 | Betulaceae | 86 | 43 | 53 | NY |
31 | Bignoniaceae | 9 | 11 | 0 | CA |
32 | Blechnaceae | 5 | 11 | 7 | CA |
33 | Boraginaceae | 127 | 478 | 263 | CA |
34 | Brachytheciaceae | 59 | 0 | 0 | NY |
35 | Brassicaceae | 468 | 1,123 | 774 | CA |
36 | Bruchiaceae | 4 | 0 | 0 | NY |
37 | Bryaceae | 22 | 2 | 0 | NY |
38 | Buddlejaceae | 4 | 5 | 0 | CA |
39 | Butomaceae | 2 | 0 | 2 | ID/NY |
40 | Buxaceae | 5 | 0 | 0 | NY |
41 | Buxbaumiaceae | 5 | 0 | 0 | NY |
42 | Cabombaceae | 6 | 6 | 3 | CA/NY |
43 | Cactaceae | 10 | 238 | 78 | CA |
44 | Callitrichaceae | 16 | 18 | 11 | CA |
45 | Calycanthaceae | 7 | 2 | 0 | NY |
46 | Campanulaceae | 76 | 146 | 57 | CA |
47 | Cannabaceae | 13 | 8 | 10 | NY |
48 | Cannaceae | 4 | 0 | 0 | NY |
49 | Capparaceae | 18 | 49 | 26 | CA |
50 | Caprifoliaceae | 182 | 107 | 81 | NY |
51 | Caryophyllaceae | 338 | 506 | 414 | CA |
52 | Celastraceae | 24 | 18 | 4 | NY |
53 | Ceratophyllaceae | 8 | 5 | 5 | NY |
54 | Cercidiphyllaceae | 2 | 0 | 0 | NY |
55 | Chenopodiaceae | 245 | 404 | 245 | CA |
56 | Cistaceae | 44 | 27 | 0 | NY |
57 | Cladoniaceae | 5 | 2 | 2 | NY |
58 | Clethraceae | 4 | 0 | 0 | NY |
59 | Climaciaceae | 5 | 0 | 0 | NY |
60 | Clusiaceae | 52 | 15 | 12 | NY |
61 | Commelinaceae | 37 | 14 | 0 | NY |
62 | Convolvulaceae | 68 | 130 | 17 | CA |
63 | Cornaceae | 55 | 34 | 36 | NY |
64 | Crassulaceae | 62 | 230 | 57 | CA |
65 | Cucurbitaceae | 41 | 49 | 6 | CA |
66 | Cupressaceae | 39 | 130 | 30 | CA |
67 | Cuscutaceae | 31 | 56 | 31 | CA |
68 | Cyperaceae | 1,016 | 733 | 663 | NY |
69 | Dennstaedtiaceae | 8 | 5 | 5 | NY |
70 | Diapensiaceae | 8 | 0 | 0 | NY |
71 | Dicranaceae | 20 | 0 | 0 | NY |
72 | Dioscoreaceae | 6 | 0 | 0 | NY |
73 | Dipsacaceae | 17 | 10 | 8 | NY |
74 | Ditrichaceae | 9 | 2 | 0 | NY |
75 | Droseraceae | 8 | 11 | 6 | CA |
76 | Dryopteridaceae | 121 | 65 | 71 | NY |
77 | Ebenaceae | 7 | 7 | 0 | CA/NY |
78 | Elaeagnaceae | 16 | 10 | 12 | NY |
79 | Elatinaceae | 6 | 14 | 4 | CA |
80 | Empetraceae | 13 | 6 | 0 | NY |
81 | Entodontaceae | 8 | 0 | 0 | NY |
82 | Ephemeraceae | 5 | 0 | 0 | NY |
83 | Equisetaceae | 48 | 36 | 51 | ID |
84 | Ericaceae | 236 | 310 | 110 | CA |
85 | Eriocaulaceae | 7 | 3 | 0 | NY |
86 | Euphorbiaceae | 111 | 233 | 66 | CA |
87 | Fabaceae | 604 | 1,855 | 871 | CA |
88 | Fagaceae | 96 | 91 | 0 | NY |
89 | Fissidentaceae | 22 | 1 | 1 | NY |
90 | Flacourtiaceae | 2 | 0 | 0 | NY |
91 | Fontinalaceae | 9 | 0 | 0 | NY |
92 | Fumariaceae | 31 | 28 | 20 | NY |
93 | Funariaceae | 21 | 0 | 0 | NY |
94 | Gentianaceae | 72 | 162 | 122 | CA |
95 | Geraniaceae | 34 | 80 | 25 | CA |
96 | Ginkgoaceae | 2 | 0 | 0 | NY |
97 | Grimmiaceae | 2 | 3 | 3 | CA/ID |
98 | Grossulariaceae | 46 | 150 | 72 | CA |
99 | Haemodoraceae | 7 | 0 | 0 | NY |
100 | Haloragaceae | 37 | 28 | 20 | NY |
101 | Hamamelidaceae | 8 | 2 | 0 | NY |
102 | Hippocastanaceae | 14 | 2 | 0 | NY |
103 | Hippuridaceae | 2 | 2 | 2 | Same |
104 | Hydrangeaceae | 24 | 53 | 14 | CA |
105 | Hydrocharitaceae | 45 | 44 | 30 | NY |
106 | Hydrophyllaceae | 15 | 336 | 89 | CA |
107 | Hylocomiaceae | 8 | 0 | 0 | NY |
108 | Hymeneliaceae | 1 | 1 | 4 | ID |
109 | Hymenophyllaceae | 2 | 0 | 0 | NY |
110 | Hypnaceae | 20 | 0 | 0 | NY |
111 | Iridaceae | 46 | 114 | 26 | CA |
112 | Isoetaceae | 39 | 30 | 28 | NY |
113 | Juglandaceae | 52 | 10 | 2 | NY |
114 | Juncaceae | 162 | 177 | 143 | CA |
115 | Juncaginaceae | 9 | 22 | 13 | CA |
116 | Lamiaceae | 399 | 413 | 182 | CA |
117 | Lardizabalaceae | 2 | 0 | 0 | NY |
118 | Lauraceae | 12 | 11 | 0 | NY |
119 | Lecanoraceae | 3 | 1 | 1 | NY |
120 | Lemnaceae | 27 | 45 | 17 | CA |
121 | Lentibulariaceae | 37 | 23 | 17 | NY |
122 | Leucobryaceae | 2 | 0 | 0 | NY |
123 | Liliaceae | 263 | 741 | 243 | CA |
124 | Limnanthaceae | 3 | 36 | 3 | CA |
125 | Linaceae | 36 | 52 | 20 | CA |
126 | Lycopodiaceae | 114 | 9 | 47 | NY |
127 | Lygodiaceae | 2 | 0 | 0 | NY |
128 | Lythraceae | 25 | 26 | 16 | CA |
129 | Magnoliaceae | 20 | 0 | 0 | NY |
130 | Malvaceae | 66 | 284 | 62 | CA |
131 | Marsileaceae | 2 | 15 | 12 | CA |
132 | Melastomataceae | 12 | 0 | 0 | NY |
133 | Meliaceae | 3 | 3 | 0 | CA/NY |
134 | Menispermaceae | 2 | 0 | 0 | NY |
135 | Menyanthaceae | 10 | 7 | 3 | NY |
136 | Mniaceae | 10 | 0 | 0 | NY |
137 | Molluginaceae | 4 | 9 | 3 | CA |
138 | Monotropaceae | 19 | 31 | 21 | CA |
139 | Moraceae | 21 | 16 | 5 | NY |
140 | Myricaceae | 22 | 5 | 0 | NY |
141 | Najadaceae | 25 | 21 | 10 | NY |
142 | Nelumbonaceae | 9 | 5 | 0 | NY |
143 | Nyctaginaceae | 30 | 157 | 8 | CA |
144 | Nymphaeaceae | 50 | 23 | 30 | NY |
145 | Oleaceae | 39 | 49 | 0 | CA |
146 | Onagraceae | 237 | 661 | 314 | CA |
147 | Ophioglossaceae | 60 | 53 | 48 | NY |
148 | Orchidaceae | 285 | 175 | 173 | NY |
149 | Orobanchaceae | 22 | 74 | 51 | CA |
150 | Orthotrichaceae | 11 | 0 | 0 | NY |
151 | Osmundaceae | 10 | 0 | 0 | NY |
152 | Oxalidaceae | 70 | 58 | 47 | NY |
153 | Paeoniaceae | 2 | 4 | 2 | CA |
154 | Papaveraceae | 38 | 98 | 26 | CA |
155 | Parmeliaceae | 1 | 1 | 1 | Same |
156 | Pedaliaceae | 11 | 25 | 8 | CA |
157 | Phytolaccaceae | 4 | 7 | 0 | CA |
158 | Pinaceae | 52 | 113 | 65 | CA |
159 | Plantaginaceae | 64 | 78 | 33 | CA |
160 | Platanaceae | 9 | 8 | 0 | NY |
161 | Plumbaginaceae | 16 | 22 | 0 | CA |
162 | Poaceae | 1,927 | 2,347 | 1,507 | CA |
163 | Podostemaceae | 3 | 0 | 0 | NY |
164 | Polemoniaceae | 35 | 637 | 279 | CA |
165 | Polygalaceae | 29 | 17 | 0 | NY |
166 | Polygonaceae | 331 | 917 | 435 | CA |
167 | Polypodiaceae | 9 | 14 | 9 | CA |
168 | Polytrichaceae | 23 | 0 | 0 | NY |
169 | Pontederiaceae | 17 | 17 | 6 | CA/NY |
170 | Portulacaceae | 24 | 203 | 120 | CA |
171 | Potamogetonaceae | 115 | 83 | 103 | NY |
172 | Pottiaceae | 34 | 2 | 1 | NY |
173 | Primulaceae | 67 | 104 | 102 | CA |
174 | Pteridaceae | 16 | 111 | 39 | CA |
175 | Pyrolaceae | 54 | 65 | 67 | ID |
176 | Ranunculaceae | 288 | 434 | 412 | CA |
177 | Resedaceae | 7 | 8 | 0 | CA |
178 | Rhamnaceae | 18 | 202 | 20 | CA |
179 | Rosaceae | 1,305 | 803 | 609 | NY |
180 | Rubiaceae | 157 | 230 | 72 | CA |
181 | Ruppiaceae | 11 | 17 | 7 | CA |
182 | Rutaceae | 18 | 12 | 0 | NY |
183 | Salicaceae | 284 | 352 | 383 | ID |
184 | Salviniaceae | 5 | 3 | 0 | NY |
185 | Santalaceae | 9 | 5 | 9 | ID/NY |
186 | Sapindaceae | 5 | 37 | 3 | CA |
187 | Sarraceniaceae | 11 | 16 | 0 | CA |
188 | Saururaceae | 2 | 3 | 0 | CA |
189 | Saxifragaceae | 55 | 219 | 234 | ID |
190 | Scheuchzeriaceae | 5 | 5 | 5 | Same |
191 | Schistostegaceae | 2 | 0 | 2 | ID/NY |
192 | Schizaeaceae | 2 | 0 | 0 | NY |
193 | Scrophulariaceae | 430 | 1,146 | 556 | CA |
194 | Selaginellaceae | 6 | 15 | 14 | CA |
195 | Sematophyllaceae | 6 | 0 | 0 | NY |
196 | Simaroubaceae | 4 | 7 | 4 | CA |
197 | Smilacaceae | 22 | 3 | 0 | NY |
198 | Solanaceae | 118 | 244 | 63 | CA |
199 | Sparganiaceae | 24 | 22 | 24 | ID/NY |
200 | Sphagnaceae | 42 | 3 | 1 | NY |
201 | Staphyleaceae | 2 | 2 | 0 | CA/NY |
202 | Sterculiaceae | 2 | 30 | 0 | CA |
203 | Styracaceae | 7 | 9 | 0 | CA |
204 | Symplocaceae | 5 | 0 | 0 | NY |
205 | Taxaceae | 10 | 4 | 2 | NY |
206 | Teloschistaceae | 1 | 1 | 1 | Same |
207 | Tetraphidaceae | 2 | 0 | 0 | NY |
208 | Theliaceae | 3 | 0 | 0 | NY |
209 | Thelypteridaceae | 26 | 9 | 13 | NY |
210 | Thuidiaceae | 3 | 0 | 0 | NY |
211 | Thymelaeaceae | 6 | 2 | 0 | NY |
212 | Tiliaceae | 22 | 0 | 0 | NY |
213 | Trapaceae | 5 | 0 | 0 | NY |
214 | Tropaeolaceae | 2 | 2 | 0 | CA/NY |
215 | Typhaceae | 6 | 10 | 5 | CA |
216 | Ulmaceae | 23 | 19 | 10 | NY |
217 | Urticaceae | 52 | 61 | 32 | CA |
218 | Valerianaceae | 13 | 59 | 27 | CA |
219 | Verbenaceae | 53 | 126 | 7 | CA |
220 | Violaceae | 176 | 120 | 79 | NY |
221 | Viscaceae | 12 | 44 | 6 | CA |
222 | Vitaceae | 55 | 18 | 4 | NY |
223 | Vittariaceae | 2 | 0 | 0 | NY |
224 | Xyridaceae | 15 | 0 | 0 | NY |
225 | Zannichelliaceae | 5 | 5 | 5 | Same |
226 | Zosteraceae | 5 | 8 | 0 | CA |
227 | Zygophyllaceae | 5 | 31 | 5 | CA |
228 | Aloaceae | 0 | 2 | 0 | CA |
229 | Aponogetonaceae | 0 | 2 | 0 | CA |
230 | Arecaceae | 0 | 15 | 0 | CA |
231 | Basellaceae | 0 | 4 | 0 | CA |
232 | Bataceae | 0 | 2 | 0 | CA |
233 | Burseraceae | 0 | 2 | 0 | CA |
234 | Caulerpaceae | 0 | 2 | 0 | CA |
235 | Crossosomataceae | 0 | 18 | 9 | CA |
236 | Cyatheaceae | 0 | 3 | 0 | CA |
237 | Cymodoceaceae | 0 | 5 | 0 | CA |
238 | Datiscaceae | 0 | 2 | 0 | CA |
239 | Elaeocarpaceae | 0 | 4 | 0 | CA |
240 | Ephedraceae | 0 | 16 | 0 | CA |
241 | Fouquieriaceae | 0 | 3 | 0 | CA |
242 | Frankeniaceae | 0 | 6 | 0 | CA |
243 | Garryaceae | 0 | 11 | 0 | CA |
244 | Gracilariaceae | 0 | 2 | 0 | CA |
245 | Gunneraceae | 0 | 3 | 0 | CA |
246 | Halymeniaceae | 0 | 2 | 0 | CA |
247 | Krameriaceae | 0 | 8 | 0 | CA |
248 | Lennoaceae | 0 | 6 | 0 | CA |
249 | Loasaceae | 0 | 84 | 29 | CA |
250 | Melianthaceae | 0 | 2 | 0 | CA |
251 | Myoporaceae | 0 | 2 | 0 | CA |
252 | Myrtaceae | 0 | 32 | 0 | CA |
253 | Parkeriaceae | 0 | 4 | 0 | CA |
254 | Passifloraceae | 0 | 6 | 0 | CA |
255 | Pittosporaceae | 0 | 8 | 0 | CA |
256 | Punicaceae | 0 | 2 | 0 | CA |
257 | Rafflesiaceae | 0 | 2 | 0 | CA |
258 | Scouleriaceae | 0 | 3 | 3 | CA/ID |
259 | Simmondsiaceae | 0 | 4 | 0 | CA |
260 | Stereocaulaceae | 0 | 2 | 0 | CA |
261 | Tamaricaceae | 0 | 12 | 3 | CA |
262 | Ulvaceae | 0 | 3 | 0 | CA |
263 | Verrucariaceae | 0 | 0 | 2 | ID |
Visualizing the Data
I created a count plot using seaborn to show which states, or state combinations, have the highest variety within each plant family.
base_color = sb.color_palette()[2]
sb.countplot(data=combo4, x='Highest Presence', color="#B6D1BE", order=combo4['Highest Presence'].value_counts().index)
n_points = combo4.shape[0]
cat_counts = combo4['Highest Presence'].value_counts()
locs, labels = plt.xticks()
for loc, label in zip(locs, labels):
count = cat_counts[label.get_text()]
pct_string = count
plt.text(loc, count+5, pct_string, ha='center', color='black', fontsize=12)
plt.xticks(rotation=25)
plt.xlabel('')
plt.ylabel('')
plt.title('Highest Concentration of Plant Families by State', fontsize=14, y=1.05)
plt.ylim(0, 140)
sb.despine();
Further Considerations
There are many factors that play into plant family diversity. The comparison of plant families in New York, California, and Idaho was purely out of curiosity. Further investigations should take into account each state’s ecosystem types and land usage and ownership that may influence species diversity.