The Role of Open Access

The Role of Open Access

Open access research provides an opportunity for the public to learn and use data as needed for free, but it is not overwhelmingly common. For researchers outside of academia, trying to pull together useful data can be difficult when considering accessibility barriers.

About two months ago, I began looking for data to create a model of biological inputs and energy requirements in the United States food system. Open data resources such as FAOSTAT, the Economic Research Service,  and Bureau of Transportation Statistics provided helpful figures on land use, food imports, and food transportation values. Aside from these resources, a lot of information I wanted to reference in building a model came from scientific papers that require journal subscriptions or charge a per-article fee.

Three articles that may have been helpful in my research illustrate the cost of access:

Upon closer investigation, Appetite claims it ‘supports open access’ but charges authors $3000 to make articles available to everyone, according to publisher Elsevier. Clearly, providing affordable open access options doesn’t seem like a priority for publishers.

There may have been useful data in the articles mentioned above. However, I won’t find out because I’m sticking with open access resources for my food systems project.

Public government databases are great, but specific science studies may hold more value to independent researchers. Journals like PLOS ONE lead the way in open access articles for those looking for specific research to compliment information from public databases. A 2016 article by Paul Basken in The Chronicle of Higher Education called ‘As an Open-Access Megajournal Cedes Some Ground, a Movement Gathers Steam’ shows a rise in open access papers, but I got the figures via Boston College because the article itself is ‘premium content for subscribers.’

Rise of published open access articles over time between 2008 and 2015. Data from: Basken, P. 2016. As an open-access megajournal cedes some ground, a movement gathers steam. The Chronicle for Higher Education, 62(19), 5-5.

Charging fees for accessibility can create an elitist barrier between academia and those who want to learn more about certain topics. I’m not proposing that everyone would take advantage of open access research articles if there were cheaper publishing options, or no access fees. If more studies were open access, it would create more opportunities for members of the public to digest scientific studies on their own terms.

There’s immense value in the open-source, collaborative culture of the tech community that I hope spills over into academia. I’m optimistic about a continued increase in open access publications in the science community. For now, I’m looking forward to creating open source projects that take advantage of public data.

Data Analysis and UFO Reports

Data Analysis and UFO Reports

Data analysis and unidentified flying object (UFO) reports go hand-in-hand. I attended a talk by author Cheryl Costa who analyzes records of UFO sightings and explores their patterns. Cheryl and her wife Linda Miller Costa co-authored a book that compiles UFO reports called UFO Sightings Desk Reference: United States of America 2001-2015.

Records of UFO sightings are considered citizen science because people voluntarily report their experiences. This is similar to wildlife sightings recorded on websites like eBird that help illustrate bird distributions across the world. People report information about UFO sighting events including date, time, and location.

A dark night sky with the moon barely visible and trees below.
Night sky along the roadside outside Wayquecha Biological Field Station in Peru, taken April 2015.

Cheryl spoke about gathering data from two main online databases, MUFON (Mutual UFO Network) and NUFORC (National UFO Reporting Network). NUFORC’s database is public and reports can be sorted by date, UFO shape, and state. MUFON’s database requires a paid membership to access the majority of their data. This talk was not a session to discuss conspiracy theories, but a chance to look at trends in citizen science reports.

The use of data analysis on UFO reports requires careful consideration of potential bias and reasonable explanations for numbers in question. For example, a high volume of reports in the summer could be because more people are spending time outside and would be more likely to notice something strange in the sky.

This talk showed me that conclusions may be temptingly easy to draw when looking at UFO data as a whole, but speculations should be met with careful criticism. The use of the scientific method when approaching ufology, or the study of UFO sightings, seems key for a field often met with overwhelming skepticism.

I have yet to work with any open-source data on UFO reports, but this talk reminded me of the importance of a methodical approach to data analysis. Data visualization for any field of study starts with asking questions, being mindful of outside factors, and being able to communicate messages within large data sets to any audience.

Why I Started Reading More Often

This year, I began to read more books thanks to a social media hiatus between January and March. I logged out of Facebook, Instagram, Twitter, and Snapchat and deleted the apps on my phone. I knew I spent way too many hours scrolling mindlessly through photos and status updates.

My intention was to renegotiate my use of free time to focus solely on learning and self improvement. I decided to find books at the library on topics for data science, coding, and topics of general curiosity.

An assortment of books on a shelf.
A section of my personal library including books that I have not yet finished.

Self education through reading has helped me confront some of my general anxieties about topics I find challenging. In 2018, I have read books focusing on mindfulness, coding, personal finance, business management, and behavioral psychology. I still enjoy reading books in my comfort zone of science and conservation, but I think it’s helpful to understand other fields of interest.

Instead of basking in my blatant ignorance about retirement plans and investments, I’ve been trying to read more books about business and finance. Learning about topics I find totally foreign has forced me to realize it’s simple and rewarding to address ignorance head-on.

Additionally, a few of the books I have read this year are just for pure fun. Authors like Reshma Saujani (founder of Girls Who Code) and Tim Ferriss (lifestyle coach extraordinaire) inspire me endlessly so I chose to read books by each of them. I appreciate how books can provide a platform to connect readers with mentors from any field.

I hope to keep the momentum going and maintain my current reading habits for the rest of the year. In that spirit, I have designated a new page for my reading list and I include a few notes on each book.

 

Reducing Plastic Use

Reducing Plastic Use
Various pieces of plastic trash debris are strewn alongside seaweed and rocks on a beach.
Assorted plastic trash on the beach at Pelican Cove Park in Rancho Palos Verdes, CA, 2017.

In the spirit of this year’s Earth Day theme (‘End Plastic Pollution’), I researched the fate of plastic. The Environmental Protection Agency (EPA) prepared a report for 2014 municipal waste stream data for the United States. Plastic products were either recycled, burned for energy production, or sent to landfills. I used pandas to look at the data and Matplotlib to create a graph. I included percentages for each fate and compared the categories of total plastics, containers and packaging, durable goods, and nondurable goods.

A graph compares different types of plastic products and their fate in the municipal waste stream.
Percentages of total plastics and plastic types that get recycled, burned for energy, or sent to a landfill, according to the EPA.

The EPA data shows a majority of plastic products reported in the waste stream were sent to landfills. Obviously, not all plastic waste actually reaches a recycling facility or landfill. Roadsides, waterways, and beaches are all subject to plastic pollution. Decreasing personal use of plastic products can help reduce the overall production of waste.

Here are some ideas for cutting back on plastic use:

  • Bring reusable shopping bags to every store.
    • Utilize cloth bags for all purchases.
    • Opt for reusable produce bags for fresh fruit and vegetables instead of store-provided plastic ones.
  • Ditch party plasticware.
    • Buy an assortment of silverware from a thrift store for party use.
    • Snag a set of used glassware for drinks instead of buying single-use plastic cups.
  • Use Bee’s Wrap instead of plastic wrap.
    • Bee’s Wrap is beeswax covered cloth for food storage. It works exactly the same as plastic wrap, but it can be used over and over.
  • Choose glassware instead of plastic zip-locked bags for storing food.
    • Glass containers like Pyrex can be used in place of single-use plastic storage bags.
  • Say ‘no’ to plastic straws.
    • Get in the habit of refusing a straw at restaurants when you go out.
    • Bring a reusable straw made out of bamboo, stainless steel, or glass to your favorite drink spot.

 

To check out the code for the figure I created, here’s the repository for it.

5 Tips for Debugging Your Life

5 Tips for Debugging Your Life
A collection of beetles of various sizes, shapes, and colors sits in a glass container at a museum.
Beetle collection at Buffalo Museum of Science in Buffalo, New York, 2017

One of my first lessons as a new programmer was to learn how to debug code. Debugging means to review your code to find errors and correct them accordingly. Entire programs can be thrown off by one stray keystroke. This made me think about how minor, or major, habits and mindsets were holding me back from achieving my full potential. Here’s some advice for how I edited my personal life to learn code on my own terms.

1. Be patient with yourself and embrace failure.

Messing up code is inevitable, so don’t take it too personally when it happens. This took me a few weeks to learn because my entire life I’ve always thought failure was unacceptable. I recommend being overwhelmingly patient with yourself because every mistake presents an opportunity to learn a valuable lesson. When you get to the point where your code is clean and runs properly, the feeling of accomplishment will overshadow struggles along the way.

2. Learn to say ‘no’ more often to things that no longer serve you.

Get comfortable with saying ‘no’ because it can keep you from wasting time doing things you don’t wholeheartedly want to do. Take a moment to see exactly where your time is going if you’ve got too much on your plate and want more time to code. I started to learn how to code last year when I was unemployed and did not have a job lined up. I was dedicating at least 40 hours a week to learning, but some of my peers still saw this as a vacation. I began to say ‘no’ to certain activities in order to spend more time improving my programming skills and less time with people who made me feel awful. It may take some careful revision to eliminate excess drains of your time to create a more refined schedule that will ultimately benefit your mental state.

3. Don’t be afraid to ask for help online or attend local events.

Programming communities are abundant both online and in person, depending on your location. There are online resources like Stack Overflow for asking questions. I’ve also used the live chat assistance feature on Codecademy multiple times when I had a problems that forums could not answer. Don’t be afraid to turn to virtual support networks when you need help because someone will probably be able to help you.

For in-person events, Meetup is an awesome way to find interesting talks and events for local developers. I was nervous before attending a coding meetup in my area for the first time. Ultimately, I was grateful I worked up the courage to attend because I got to meet some wonderful mentors. I also use Women Who Code to keep an eye out for chapter events and conferences in major cities. Depending on your specific interests, there are a number of organizations that can help and encourage you as a programmer.

4. Create a work environment that encourages productivity.

Your work space can easily influence your level of productivity. Recognize your habits and common sources of distraction, then tailor your work area to these considerations. For me, this means having a clutter-free work desk, a comfortable chair, and a room free of outside noises. I also get easily distracted by my phone, so I try to keep it on silent and out of reach. Additionally, I think it’s helpful to have some sort of physical notebook or online system for random notes and ideas you think about while programming. This can keep ideas organized but separate from specific class or project notes. I use Google Keep to jot down quick ideas, but there are other similar alternatives like Evernote.

5. Remember that there are many potential routes to reach the same destination.

There are multiple ways to program. People can write code differently and still yield the same end result. Similarly, there is not one singular route to success and personal fulfillment. Take pride in your ability to creatively problem solve and celebrate and respect diverse ideas when collaborating with others. There are many ways to learn and grow in programming but you ultimately get to decide what methods work best for you.

Highlights from Data Science Day 2018

Highlights from Data Science Day 2018

Columbia University hosted Data Science Day 2018 on March 28th at their campus in Manhattan. I traveled to New York to attend the event and learn more about how data science plays a role in health, climate, and finance research. A few of the presentations stood out, including the environmental talks and a keynote address from Diane Greene, the CEO of Google Cloud.

View of Grand Army Plaza
Grand Army Plaza in Manhattan, New York, 2013

I was extremely excited when I first saw the program for Data Science Day because I noticed a series of lightning talks on climate change. The session entitled ‘Climate + Finance: Use of Environmental Data to Measure and Anticipate Financial Risk’ brought together Columbia staff who specialize in economics, climate research, and environmental policy.

Geoffrey Heal gave a talk called ‘Rising Waters: The Economic Impact of Sea Level Rise’ that addressed financial models associated with sea level rise projections. Heal presented major cities and associated data for property values, historic flooding, and flood maps to illustrate the overall financial impact of sea level rise. This talk highlighted the importance of interdisciplinary data science work when addressing complex issues like climate change. Collaboration between academic researchers and national groups like NOAA and FEMA provides a platform for data science work that can inform professionals across career fields.

Lisa Goddard spoke about ‘Data & Finance in the Developing World’. The main topics of her talk were food security and drought impacts in developing countries. Goddard’s research included rain gauge measurements, satellite imagery, soil moisture levels, and crop yield records. She addressed the use of various climate data to advise appropriate resilience tactics, such as crop insurance for financial security. Overall, dealing with food security will be essential when handling the impacts of climate change on small scale farms across the world. Data science can help the agricultural sector by providing farmers with more information to consider when planning for effects of climate change.

Wolfram Schlenker gave a talk called ‘Agricultural Yields and Prices in a Warming World’. He addressed the impact of weather shocks to common crops, such as unanticipated exposure to hot temperatures. Corn, a tropical plant, can potentially see higher yields when there are sudden, extreme instances of warm weather. Schlenker presented a fresh perspective on how climate change can impact crop yields differently according to species. A combination of climate models, market conditions, and yield data can provide a foundation for better understanding climate change’s impacts on agricultural commodities on a case-by-case basis.

Diane Greene’s keynote session for Data Science Day 2018 provoked important considerations when navigating the world of data science. Greene mentioned Google Cloud’s main goal is to deliver intuitive technological capabilities. Google Cloud deals with a wide range of APIs that make the flow of information across the world easier. For example, Google Cloud’s Translation API makes it possible for online articles to be translated in different languages to increase readability. Diane Greene’s talk inspired me to be creative with innovation in data science and consider usability and collaboration on all fronts.

This event was a great opportunity to learn from leaders in the field of data science. Communication and collaboration were major themes of these talks and I left Data Science Day 2018 feeling empowered to address challenges like climate change.

Creating a Data Science Resume for Career Switchers

Creating a Data Science Resume for Career Switchers

As a career switcher, I had no idea where to begin in my efforts to create a data science resume or how to incorporate my background in ecological field work. I’ve only ever needed a resume for one specific field for my career thus far. Thankfully, Kaggle hosted a virtual CareerCon last week and it helped me develop new strategies for tweaking my work experience to target data science.

William Chen, a Data Science Manager at Quora, led a session called How to Build a Compelling Data Science Portfolio & Resume that included tips for formatting a data science resume. William spoke directly to his experience reviewing data science portfolios. Major advice from his talk included:

  • Keep it concise. A one page resume with simple readability is recommended.
  • Include relevant coursework and order it accordingly, from most to least to relevant.
  • Mention your technical skills, and especially those included in the posting for a desired position.
  • Highlight projects and include results and references, like web links.
  • Avoid including impersonal projects such as homework assignments.
  • Tailor your experience toward the job and include relevant capstone projects and independent research if you don’t have direct data science work experience to mention.

Below I’ve included some of my own changes to my resume to take existing project experience I have in the realm of data analysis and tweak it to fit a data science resume.

First of all, here’s an overview visual of the format of a recent version of my resume tailored for a job in land management and my edited resume for data science. The quality of the text isn’t amazing, but this is mainly to show increased readability and concise, relevant content.

Comparison of environmental science resume (left) and newly edited data science resume (right).
Comparison of environmental science resume (left) and newly edited data science resume (right).

William Chen’s advice led me to get to the point about why I would be a good candidate for an opportunity in data science. This meant I had to get my message across quickly. Previously, my resume was a wall of text divided by education, work experience, and relevant community service. This format is dense and confusing and would be improper to send to a hiring official in response to a data science posting.

I broke down my data science resume into the categories of experience, education, projects, skills, and relevant coursework. In experience, I highlighted potentially relevant duties such as data collection, analysis, and visualization that show my personal connection to data science. Next, I cut down the text in my education section from my previous resume to reveal only my school, its location, my degree earned, and my enrollment dates. The projects section includes three research projects I worked on in my undergraduate career that involved data collection, analysis, and synthesis. Lastly, I included a section for skills and a section for relevant coursework.

No matter what your academic or work background, you can find ways to make a data science resume. William Chen’s advice brought me to the realization that I had relevant technical skills and project experience in environmental science that I could translate into a purposeful foundation for a job in data science. When you think about your qualifications outside the context of a specific career field, creating a data science resume becomes a simple task.

How to Choose an Online Data Science Course

Multiple factors can play a role in your decision process when selecting an online data science course. It is important to remember that no two educational resources are exactly the same. I recommend carefully considering your needs and learning goals, and feel free to give multiple websites a try before making a decision.

Here’s a quick overview of the major components of three educational resources I have been using to learn data science. This is based on my experiences with Codecademy, DataCamp, and Udacity. There are plenty of other educational websites to chose from, including Coursera and Udemy.

CodecademyDataCampUdacity
Languages for Data SciencePython and SQLPython, R, and SQLPython, R, and SQL
FormatInteractive Lessons and ExercisesInteractive Lessons, Exercises, and VideosVideos and Exercises
Unique FeaturesNo VideosAvailable via Mobile AppVideos Feature Industry Professionals
Helpful ResourcesHints, 'Get Help' Live Chat for Pro Users, and Community ForumHints, Show Answer Options, and Community ForumCommunity Forum
Free ContentFree CoursesFree Courses and Access to First Section of All Premium LessonsFree Courses
Premium Program CostsCodecademy Pro:
$15.99 - $19.99
per month
DataCamp Membership:
$25 - $29 per month
Data Analyst Nanodegree: $200 per month
Features of Premium CoursesQuizzes, Advisers, and ProjectsProjects and Course Completion CertificatesProject Review and Career Services

Pick a Language

Two of the most popular languages for data science are Python and R. Another language called SQL (Structured Query Language) is also helpful to know because you can use it to work with specific data in a database. Python and R are both widely used, so I recommend trying out each language if you’re aiming to focus on just one. Depending on your preferences, the offerings of Codecademy, DataCamp, and Udacity may play a role in your decision. Codecademy offers Python and SQL. DataCamp has lessons in Python, SQL, and R, with career tracks for data scientists with Python and R. Udacity has a selection of courses that cater to all three languages. At the end of the day, choosing a language depends on how you seek to use your data science skills.

Learning Style

Test out different websites and make sure you enjoy the format of lessons before committing to one, and especially before paying for a subscription or program. If video lessons play to your strengths, I recommend using Udacity. Course videos are instructed by a wide range of data science industry professionals. This offers a unique perspective as to how people use data science in specific career areas.

Websites like Codecademy and DataCamp are designed for hands-on, visual learners. Both websites offer a console with instant feedback when you run lines of code. Codecademy, unlike DataCamp and Udacity, does not include video lessons in the curriculum. If you prefer reading at your own pace and executing lines of code without trying to absorb a video lecture, Codecademy might be right for you. DataCamp provides video introductions before coding lessons and tasks. Also, DataCamp offers an app for on-the-go coding lessons. However, the preferred format for learning with DataCamp is on the computer.

Helpful Resources 

There are tools in all three websites that help you if you get stuck on a problem. Codecademy and DataCamp offer hints specific to assigned tasks, as well as access to community forums where users can post questions for others to answer. Codecademy also offers live chat assistance for Pro members, where a tutor will review code in real time. DataCamp features an option to show the answer code for an assigned task, if you are still having trouble after reviewing a hint. The format of Udacity does not involve an interactive console, so when your code is incorrect, the best place to find help is on their community forums.

Free Content

Codecademy, DataCamp, and Udacity all offer free courses that can cater to your interests in data science. Free lessons on each website are self-paced and designed to adapt to your schedule and lifestyle.

Premium Programs

Each website offers the option to pay for access to additional content and benefits.

  • Codecademy Pro offers three levels of subscription: one month($19.99), six months($17.99 per month), and a year($15.99 per month). There’s also an option for Pro Intensive courses, such as Intro to Data Analysis, that cost $199 each. Membership benefits include quizzes and projects.
  • DataCamp membership is in the form of a monthly plan($29 per month) or a yearly plan($25 per month). Members gain unlimited access to all programs.
  • Udacity offers a Data Analyst Nanodegree program with 2 three-month terms. Term 1($499) and term 2($699) result in a cost of about $200 per month for six months. Benefits of this program include project feedback and exclusive career services.

DataCamp’s membership offers the most flexibility out of these three platforms because premium lessons are at your own pace. For Codecademy members, the Intro to Data Analysis Pro Intensive has an outlined course time frame of 4 months. You can work ahead as much as you would like depending on your schedule. Udacity’s Data Analyst Nanodegree program is made up of two 3-month terms, for a total of 6 months estimated to complete the program.

 

How Does Environmental Science Relate to Computer Programming?

How Does Environmental Science Relate to Computer Programming?

This is a question I have received quite frequently in recent weeks. Computer programming languages can be used to make scientific analysis much easier. This applies directly to environmental science because there is a wealth of data within the world of ecological studies. Coding offers a way for scientists to automate repetitive  tasks using lines of code, and therefore freeing up time for other work. This can result in the creation of new software that can be utilized by scientists across disciplines.

Statistics and environmental science go hand in hand. Science experiments involve a natural order of determining a hypothesis, establishing test methods, collecting data, analyzing data, and drawing conclusions.

The data analysis portion is where statistical models are important. Oftentimes, scientists want to know whether the results of their experiments hold statistical significance. This means proving that the trends observed in data are not just a result of some sort of mistake in the experiment design or execution. Computer programming languages such as Python can help scientists execute the statistical analysis of data by writing code to analyze their data. Python packages like NumPy provide a basis for computational analysis and Python libraries like SciPy offer modules such as scipy.stats that offer the ability to perform hypothesis tests. These include T-Tests and Analysis of Variance (ANOVA) tests on numerical data and the Chi Square test for categorical data. Packages in R such as car offer a function for ANOVA tables, but R Studio itself includes functions such as t.test to analyze data. Programming languages offer packages for creating graphs and visuals to display analytical tests, such as Matplotlib in Python and ggplot2 in R.

Sections of environmental science, such as conservation biology, can benefit from programming because of different computer models. As a college student, the first software I was introduced to that was created specifically for use in conservation science was a population viability analysis (PVA) software called Vortex. Population viability measures the likelihood of a group of organisms to thrive or decline under a certain set of circumstances. The Vortex software allows users to adjust the circumstances for populations in areas such as genetic diversity, number of organisms, and mortality rate. I used the software in a classroom setting while studying in Peru, and I performed various tests to see what factors would be detrimental to the population of a theoretical species. This tool is one of many that can be of assistance for environmental science professionals who can use PVA to inform management decisions for threatened species.

A treeline in the lowland Amazon rain forest along the Madre de Dios River in Peru, taken February 2015. Advancements in computer programming can provide tools to help increase scientific understanding of biologically rich areas, such as the Amazon.

Within the field of environmental science, computer programming can be a great advantage because it allows scientists to analyze data in efficient ways that can make everyday tasks easier. The utilization of programming languages and modeling software offers opportunities to put computers to use where humans would have otherwise performed repetitive tasks. This can provide scientists with more time to make discoveries and inform decisions to make the world a better place.

A Fresh Start

You might be wondering how I got here. Last December, I decided to take a break from my career in order to transition to a new professional path I found more fulfilling. I was sucked into the world of seasonal field technician gigs and I wanted out. I worked in three seasonal positions before moving back home to Syracuse, New York. I panicked and took a job at a construction company in Syracuse, and then quit after realizing that it wasn’t at all where I wanted my life to go. And this is where data science comes in.

I started looking up positions that I wanted and graduate programs that I dreamed of getting into. A common thread was that experience in programming languages like Python and R was a desired qualification for many job postings. This was where I could start changing my life around from collecting data to performing data analysis using programming.

Data science is a field where people use existing data to make predictions, create visualizations, and derive information from analysis to create a narrative for the data. A career in data science seemed like the natural next step for me.

I started off using Udacity. This website offers a huge range of online courses that combine video lessons with assignments in various programs, such as R. The Udacity course that I took was Data Analysis with R. It’s a free course that gives a great introduction to using R.

Currently, I am also taking courses through Codecademy. I took two free courses,  Learn Python and Learn the Command Line, and am enrolled in the Introduction to Data Analysis Pro course that costs about $200. Codecademy course length can range anywhere from 10 hours to 10 weeks, depending on subject.

I started this journey about two months ago and hope to keep making progress everyday. Besides Python and R, I also plan on learning about SQL through Codecademy.

See my About Me page for more information.