In this tutorial, I review ways to take raw categorical survey data and create new variables for analysis and visualizations with Python using pandas and GeoPy. I’ll show how to make new pandas columns from encoding complex responses, geocoding locations, and measuring distances.
Here’s the associated GitHub repository for this workshop, which includes the data set and a Jupyter Notebook for the code.
Thanks to the St. Lawrence Eastern Lake Ontario Partnership for Regional Invasive Species Management (SLELO PRISM), I was able to use boat launch steward data from 2016 for this virtual workshop. The survey data was collected by boat launch stewards around Lake Ontario in upstate New York. Boaters were asked a series of survey questions and their watercrafts were inspected for aquatic invasive species.
This tutorial was originally designed for the Syracuse Women in Machine Learning and Data Science (Syracuse WiMLDS) Meetup group.
Here’s an overview of how to map the coordinates of cities mentioned in song lyrics using Python. In this example, I used Lana Del Rey’s lyrics for my data and focused on United States cities. The full code for this is in a Jupyter Notebook on my GitHub under the lyrics_map repository.
Gather Bulk Song Lyrics Data
First, create an account with Genius to obtain an API key. This is used for making requests to scrape song lyrics data from a desired artist. Store the key in a text file. Then, follow the tutorial steps from this blog post by Nick Pai and reference the API key text file within the code.
You can customize the code to cater to a certain artist and number of songs. To be safe, I put in a request for lyrics from 300 songs.
Find Cities and Countries in the Data
After getting the song lyrics in a text file, open the file and use geotext to grab city names. Append these to a new pandas dataframe.
Then, convert the coordinates column (raw_data2) into a string type to remove the parentheses and finally split on the comma.
#change the coordinates to a string
city_data['raw_data2'] = city_data['raw_data2'].astype(str)
#split the coordinates using the comma as the delimiter
city_data[['lat','lon']] = city_data.raw_data2.str.split(",",expand=True,)
#remove the parentheses
city_data['lat'] = city_data['lat'].map(lambda x:x.lstrip('()'))
city_data['lon'] = city_data['lon'].map(lambda x:x.rstrip('()'))
Convert the latitude and longitude columns back to floats because this is the usable type for plotly.
Create an account with MapBox to obtain an API key to plot my song lyric locations in a Plotly Express bubble map. Alternatively, it is also possible to generate the map without an API key if you have Dash installed. Customize the map for visibility by adjusting variables such as the color scale, the zoom extent, and the data that appears when hovering over the data.
df = px.data.carshare()
fig = px.scatter_mapbox(merged, lat='lat', lon='lon', color='mentions', size='mentions',
color_continuous_scale=px.colors.sequential.Agsunset, size_max=40, zoom=3,
'text': 'US Cities Mentioned in Lana Del Rey Songs',
#save graph as html
with open('plotly_graph.html', 'w') as f: