Highlights from Data Science Day 2018

Highlights from Data Science Day 2018

Columbia University hosted Data Science Day 2018 on March 28th at their campus in Manhattan. I traveled to New York to attend the event and learn more about how data science plays a role in health, climate, and finance research. A few of the presentations stood out, including the environmental talks and a keynote address from Diane Greene, the CEO of Google Cloud.

View of Grand Army Plaza
Grand Army Plaza in Manhattan, New York, 2013

I was extremely excited when I first saw the program for Data Science Day because I noticed a series of lightning talks on climate change. The session entitled ‘Climate + Finance: Use of Environmental Data to Measure and Anticipate Financial Risk’ brought together Columbia staff who specialize in economics, climate research, and environmental policy.

Geoffrey Heal gave a talk called ‘Rising Waters: The Economic Impact of Sea Level Rise’ that addressed financial models associated with sea level rise projections. Heal presented major cities and associated data for property values, historic flooding, and flood maps to illustrate the overall financial impact of sea level rise. This talk highlighted the importance of interdisciplinary data science work when addressing complex issues like climate change. Collaboration between academic researchers and national groups like NOAA and FEMA provides a platform for data science work that can inform professionals across career fields.

Lisa Goddard spoke about ‘Data & Finance in the Developing World’. The main topics of her talk were food security and drought impacts in developing countries. Goddard’s research included rain gauge measurements, satellite imagery, soil moisture levels, and crop yield records. She addressed the use of various climate data to advise appropriate resilience tactics, such as crop insurance for financial security. Overall, dealing with food security will be essential when handling the impacts of climate change on small scale farms across the world. Data science can help the agricultural sector by providing farmers with more information to consider when planning for effects of climate change.

Wolfram Schlenker gave a talk called ‘Agricultural Yields and Prices in a Warming World’. He addressed the impact of weather shocks to common crops, such as unanticipated exposure to hot temperatures. Corn, a tropical plant, can potentially see higher yields when there are sudden, extreme instances of warm weather. Schlenker presented a fresh perspective on how climate change can impact crop yields differently according to species. A combination of climate models, market conditions, and yield data can provide a foundation for better understanding climate change’s impacts on agricultural commodities on a case-by-case basis.

Diane Greene’s keynote session for Data Science Day 2018 provoked important considerations when navigating the world of data science. Greene mentioned Google Cloud’s main goal is to deliver intuitive technological capabilities. Google Cloud deals with a wide range of APIs that make the flow of information across the world easier. For example, Google Cloud’s Translation API makes it possible for online articles to be translated in different languages to increase readability. Diane Greene’s talk inspired me to be creative with innovation in data science and consider usability and collaboration on all fronts.

This event was a great opportunity to learn from leaders in the field of data science. Communication and collaboration were major themes of these talks and I left Data Science Day 2018 feeling empowered to address challenges like climate change.

Creating a Data Science Resume for Career Switchers

Creating a Data Science Resume for Career Switchers

As a career switcher, I had no idea where to begin in my efforts to create a data science resume or how to incorporate my background in ecological field work. I’ve only ever needed a resume for one specific field for my career thus far. Thankfully, Kaggle hosted a virtual CareerCon last week and it helped me develop new strategies for tweaking my work experience to target data science.

William Chen, a Data Science Manager at Quora, led a session called How to Build a Compelling Data Science Portfolio & Resume that included tips for formatting a data science resume. William spoke directly to his experience reviewing data science portfolios. Major advice from his talk included:

  • Keep it concise. A one page resume with simple readability is recommended.
  • Include relevant coursework and order it accordingly, from most to least to relevant.
  • Mention your technical skills, and especially those included in the posting for a desired position.
  • Highlight projects and include results and references, like web links.
  • Avoid including impersonal projects such as homework assignments.
  • Tailor your experience toward the job and include relevant capstone projects and independent research if you don’t have direct data science work experience to mention.

Below I’ve included some of my own changes to my resume to take existing project experience I have in the realm of data analysis and tweak it to fit a data science resume.

First of all, here’s an overview visual of the format of a recent version of my resume tailored for a job in land management and my edited resume for data science. The quality of the text isn’t amazing, but this is mainly to show increased readability and concise, relevant content.

Comparison of environmental science resume (left) and newly edited data science resume (right).
Comparison of environmental science resume (left) and newly edited data science resume (right).

William Chen’s advice led me to get to the point about why I would be a good candidate for an opportunity in data science. This meant I had to get my message across quickly. Previously, my resume was a wall of text divided by education, work experience, and relevant community service. This format is dense and confusing and would be improper to send to a hiring official in response to a data science posting.

I broke down my data science resume into the categories of experience, education, projects, skills, and relevant coursework. In experience, I highlighted potentially relevant duties such as data collection, analysis, and visualization that show my personal connection to data science. Next, I cut down the text in my education section from my previous resume to reveal only my school, its location, my degree earned, and my enrollment dates. The projects section includes three research projects I worked on in my undergraduate career that involved data collection, analysis, and synthesis. Lastly, I included a section for skills and a section for relevant coursework.

No matter what your academic or work background, you can find ways to make a data science resume. William Chen’s advice brought me to the realization that I had relevant technical skills and project experience in environmental science that I could translate into a purposeful foundation for a job in data science. When you think about your qualifications outside the context of a specific career field, creating a data science resume becomes a simple task.

How to Choose an Online Data Science Course

Multiple factors can play a role in your decision process when selecting an online data science course. It is important to remember that no two educational resources are exactly the same. I recommend carefully considering your needs and learning goals, and feel free to give multiple websites a try before making a decision.

Here’s a quick overview of the major components of three educational resources I have been using to learn data science. This is based on my experiences with Codecademy, DataCamp, and Udacity. There are plenty of other educational websites to chose from, including Coursera and Udemy.

Languages for Data SciencePython and SQLPython, R, and SQLPython, R, and SQL
FormatInteractive Lessons and ExercisesInteractive Lessons, Exercises, and VideosVideos and Exercises
Unique FeaturesNo VideosAvailable via Mobile AppVideos Feature Industry Professionals
Helpful ResourcesHints, 'Get Help' Live Chat for Pro Users, and Community ForumHints, Show Answer Options, and Community ForumCommunity Forum
Free ContentFree CoursesFree Courses and Access to First Section of All Premium LessonsFree Courses
Premium Program CostsCodecademy Pro:
$15.99 - $19.99
per month
DataCamp Membership:
$25 - $29 per month
Data Analyst Nanodegree: $200 per month
Features of Premium CoursesQuizzes, Advisers, and ProjectsProjects and Course Completion CertificatesProject Review and Career Services

Pick a Language

Two of the most popular languages for data science are Python and R. Another language called SQL (Structured Query Language) is also helpful to know because you can use it to work with specific data in a database. Python and R are both widely used, so I recommend trying out each language if you’re aiming to focus on just one. Depending on your preferences, the offerings of Codecademy, DataCamp, and Udacity may play a role in your decision. Codecademy offers Python and SQL. DataCamp has lessons in Python, SQL, and R, with career tracks for data scientists with Python and R. Udacity has a selection of courses that cater to all three languages. At the end of the day, choosing a language depends on how you seek to use your data science skills.

Learning Style

Test out different websites and make sure you enjoy the format of lessons before committing to one, and especially before paying for a subscription or program. If video lessons play to your strengths, I recommend using Udacity. Course videos are instructed by a wide range of data science industry professionals. This offers a unique perspective as to how people use data science in specific career areas.

Websites like Codecademy and DataCamp are designed for hands-on, visual learners. Both websites offer a console with instant feedback when you run lines of code. Codecademy, unlike DataCamp and Udacity, does not include video lessons in the curriculum. If you prefer reading at your own pace and executing lines of code without trying to absorb a video lecture, Codecademy might be right for you. DataCamp provides video introductions before coding lessons and tasks. Also, DataCamp offers an app for on-the-go coding lessons. However, the preferred format for learning with DataCamp is on the computer.

Helpful Resources 

There are tools in all three websites that help you if you get stuck on a problem. Codecademy and DataCamp offer hints specific to assigned tasks, as well as access to community forums where users can post questions for others to answer. Codecademy also offers live chat assistance for Pro members, where a tutor will review code in real time. DataCamp features an option to show the answer code for an assigned task, if you are still having trouble after reviewing a hint. The format of Udacity does not involve an interactive console, so when your code is incorrect, the best place to find help is on their community forums.

Free Content

Codecademy, DataCamp, and Udacity all offer free courses that can cater to your interests in data science. Free lessons on each website are self-paced and designed to adapt to your schedule and lifestyle.

Premium Programs

Each website offers the option to pay for access to additional content and benefits.

  • Codecademy Pro offers three levels of subscription: one month($19.99), six months($17.99 per month), and a year($15.99 per month). There’s also an option for Pro Intensive courses, such as Intro to Data Analysis, that cost $199 each. Membership benefits include quizzes and projects.
  • DataCamp membership is in the form of a monthly plan($29 per month) or a yearly plan($25 per month). Members gain unlimited access to all programs.
  • Udacity offers a Data Analyst Nanodegree program with 2 three-month terms. Term 1($499) and term 2($699) result in a cost of about $200 per month for six months. Benefits of this program include project feedback and exclusive career services.

DataCamp’s membership offers the most flexibility out of these three platforms because premium lessons are at your own pace. For Codecademy members, the Intro to Data Analysis Pro Intensive has an outlined course time frame of 4 months. You can work ahead as much as you would like depending on your schedule. Udacity’s Data Analyst Nanodegree program is made up of two 3-month terms, for a total of 6 months estimated to complete the program.