Sonya Smirnova


Data Analyst - Python Developer - Project Manager

Portfolio


Main Projects

WEB visualizations

Pandas, Matplotlib, Seaborn, Python APIs

Social Analytics

About


Ambitious Data Analyst, Pythonista, novice web developer and Data Analytics Bootcamp honors graduate.
I am very passionate about data mining, parsing and creating full-stack applications with Python and JavaScript to visualize the findings.
Previous experience of managing the finance software development allows me to fill the gaps between engineer and user and contribute to the ability to be a reliable, hardworking and diligent teammate as well as an independent performer who adores to solve the coding challenges and always eager to study.
In addition, I am a huge fan of hiking, snowboarding, horseback and bike riding and many other outdoor activities.

Contact Me



Washington DC Metro Area Transportation Analysis


metro

What
The Washington Metropolitan Transit Authority provides bus and rail transit to the Washington region.

Why
Transporting over 300,000,000 passengers a year WMATA is facing many challenges concerning safety, ridership, and funding.
rides
Bottomline
Why Metro ridership is declining? What factors impact Metro performance and does Metro impact other transportation resources?

Is metro getting better or other ways of transportation are taking over its customers?

Data
Use DC area transportation data from WMATA, Taxi and Uber to understand relationships between metro performance and the use of other modes of transportation. Collect monthly ridership data for determining time of year impacts.

Sources: Metro and US Government Data: Various reports for years 2012-2017: Vital Signs, Metro’s Key Performance Indicators (KPI), annual and Metro Performance Reports, Federal Transit Administration Monthly Module Raw Data Release, Bureau of Economic Analysis, Census Bureau, DC Government: Department of For-Hire Vehicles, Uber.

Questions

Q1: Does time of year impact demand for transportation resources?
seasonaltrend

The graph of the ridership data is a characteristic of a time series.
1. Long Term Trend or Movement
2. Seasonal Movement
3. Long-Term Cyclical Movement
4. Irregular Movement

Variation Found by the Average Percentage Method
1. Each month ridership data expressed as a percentages for the whole year ridership.
2. Percentages from corresponding months of different years are averaged.
3. The resulting twelve percentages are the seasonal index.

Q2: Does Metro performance impact the demand for other transportation resources?

Conclusion: Analysis indicates Metro is losing passengers to Uber and some other mode of transportation other than taxi or bus. Perhaps the number of private car drivers is increasing: otherresources

Q3: Can one predict the availability of transportation resources based on metro past performance?

There is no strong relationship between Metro internal KPIs and Ridership: otherresources The strongest correllation is between Metro and Taxi Ridership: transport_ridership_corr There is no significant dependency between Metro KPIs and other Transportation types: transport_metrokpi_corr

Outcome & Recommendations

* Time of the year impacts Metro demand.
* Metro ridership loosing customers to other means of transportation even though economy is strong, and government employees get 100% refunds for rides.
* Metro & Bus lost 600K customers and Uber got 1.4M customer. Where those additional customers come from? Is Uber data correct?
* Current internal metro KPIs are not significant factors in ridership numbers, even though the Rail On Time Performance KPI has the strongest impact on Metro Ridership it still does not answer the question about the decrease in ridership.

What now?

Metro needs to share their data better and if metro wants more customers it should probably measure different set of KPIs

Tools and techniques: Python, Pandas, Scipy, Linregres, Statsmodels, Numpy, Matplotlib, Seaborn.
Link to a repository: GitHub
Download full presentation in pptx: Download Presentation

Close Project

Washington DC life. Web-application


Web application dedicated to tell a bit more about life in Washington DC.
Includes backend created with Python Flask and Mongo Database. D3.js and Plotly for dynamic plots, Leaflet and Mapbox for maps and Tweepy for twitter scraping.
Can be used as a template for dashboard.

Tools and techniques: Flask, Python 3.6, MongoDB, JavaScript, D3.js, Plotly, Tweepy, Leaflet, Heroku deployment.
Link to a repository: GitHub
Link to a website: Washington DC

Close Project

Skin Canser Treatment Research


The purpose of the project is to compare four skin canser treatments.
In the recent animal study 250 mice were treated through a variety of drug regimes over the course of 45 days. Their physiological responses were then monitored over the course of that time. The goal is to analyze the data and create a visualizations that can show the comparison between all the treatments.

Tools and techniques: Python, Pandas, Matplotlib and Seaborn.
Link to a repository: GitHub

Close Project

Ride Sharing Data Analysis


What are you able to do when you have an access to a complete recordset of rides in a company like Lyft or Uber?
The answer is - a lot!
The goal of this project is to offer data-backed guidance on new riding opportunities for market differentiation.

Tools and techniques: Python, Pandas, Matplotlib and Seaborn.
Link to a repository: GitHub

Close Project

OpenWeatherMap API Requests


Everybody knows that the closer to the Equator - the hotter the weather is. But how can it be proved?
The goal is to visualize temperature and other weather conditions in different cities across the globe.

Tools and techniques: Python, Pandas, Matplotlib, Openweathermapy, Requests.
Link to a repository: GitHub
Link to a website: Weather visualization

Close Project

News Sentiment Analysis


News can be different - good, bad, funny or scary. Meanwhile the same news can be interpreted and presented in contrasting ways by the news sources.
In this project a Python script was created in order to perform a sentiment analysis of the Twitter activity of various news oulets.

Tools and techniques: Python, Tweepy, Vader, Matplotlib, Seaborn.
Link to a repository: GitHub

Close Project

Twitter Bot


A Twitter bot that sends out visualized sentiment analysis of a requested Twitter account's recent 100 tweets. To activate bot please tweet '@Sonik_Belka analyze @any_twitter_account'. Bot checks new tweets twice a day.
*due to the deployment on a free Heroku plan, app performance can be interrupted by dynos expiring
Tools and techniques: Python, Tweepy, TextBlob, Heroku deployment, Matplotlib, Seaborn.
Link to a repository: GitHub
Link to Twitter

Close Project

Mars Data Online Scraping


Flask application that scrapes various websites for data related to the Mission to Mars and displays the information in a single HTML page.

Tools and techniques: Python, Flask, PyMongo, BeautifulSoup, Selenium, Chromedriver, Heroku.
Link to a repository: GitHub
Link to a website: Mission to Mars

Close Project

UFO: JavaScript and DOM Manipulation


The purpose of the project is to create a dynamically rendering table of the eye-witness ufo reports using vanilla JavaScript with possibility to perform and visualize searches.

Tools and techniques: JavaScript, HTML, CSS, Plotly.
Link to a repository: GitHub
Link to a website: UFO

Close Project

Interactive Dashboard (Flask, Javascript)


Interactive Dashboard created with Flask framework, Plotly and SQLite database.
Tools and techniques: JavaScript, Plotly, Python, Flask, SQLAlchemy.
Link to a repository: GitHub
Link to a website: Belly Button

Close Project

D3.js visualization


Visualization and analysis of Cencus demographic data with D3.js.

Tools and techniques: JavaScript, D3, HTML, CSS.
Link to a repository: GitHub
Link to a website: D3.js

Close Project

Visualizing Data with Leaflet


Map visualization with GeoJSON and Leaflet.
Tools and techniques: JavaScript, Leaflet, GeoJSON, Mapbox, D3.
Link to a repository: GitHub
Link to a website: Leaflet
*cross-origin resource sharing (CORS) needs to be enabled, takes about 30 seconds to load.

Close Project