Completed tasks and plans for learning Software Engineering/Data Science
September
Week 3: Bash practice
- [ ] Bash data analytics - Data36
- [ ] Learn
- [ ] GCP
- [ ] Azure
- [ ] AWS
- [ ] Spark
- [ ] Python gaps
- [ ] Data Camp - Intro to Python fundamentals
- [ ] classes
- [ ] Thredup planning
- [ ] What data am I extracting?
- [ ] How do I want to build a recommendation model?
Week 2: SQL + job applications
- [x] apply to jobs + internships
- [x] SQL refresher - Data36
- [x] Leetcode practice - Python
- [x] ODSC scholarship application
- [x] ODSC registration for all events
Week 1: Thredup database
- [x] organize all files according to How to Organize Your Data Science Project
- [x] basic webscrape COMPLETED!
- [x] scrape ~1000 items
- [x] Clean up all code
Plan
next month
-
[ ] classification models (most popular):
- [ ] support vector machine (SVM)
- [ ] logistic regression
- [ ] decision trees
- [ ] random forest
- [ ] XGboost
- [ ] convolutional neural network
- [ ] recurrent neural network
-
[ ] Python functions, classes, data structures
-
[ ] Python Data Science Handbook - understand + memorize
-
[ ] copy links to "Migrating to Linux" page
-
[ ] scan notes + fill in mistakes with audio?
-
[ ] DS assignment - now with linear regression
-
[ ] Titanic project
-
[ ] Lambda School - follow curriculum
-
[ ] ODSC mini-bootcamp - follow curriculum
-
[ ] Hack Reactor student projects
-
[ ] program to manipulate MIDI files with python to produce a new sound
-
[ ] take a picture of a bomber jacket in store, thredup will find an alternative on it's website. Maybe poshmark as well?
-
[ ] Migrate over to networked-thought tool prior to Mid-year review?
-
[ ] json - for debugging
-
[ ] Possible to contribute to Athens?
Short term - end of 2020
- [ ] Daniel Bourke’s youtube video - Titanic Kaggle project
- [ ] Complete Codebasics playlist
- [ ] Different models + data cleaning/manipulating
- [x] How to upload and pull from Github using command line?
- [x] Resume
- [x] Cover letters
- [ ] Fix all code in Jupyter notebooks + colab notebooks + add more comments
- [ ] run additional ML models, improve score
- [ ] Start a side project
- [ ] Airbnb app from kickstarter?
- [ ] python library for thredup and poshmark?
- [ ] Markup guidelines for github + jupyter for asthetics
- [ ] Digitize ML mind map
- [ ] visual
- [ ] detailed explanations for overview
- [ ] CS or data structures class
- [ ] Statistics - Statquest
- [ ] Linear Algebra - 3blue1brown
- [ ] Sklearn + Tensorflow - Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
- [ ] Command Line - Data36
- [ ] look at what profiles from "so good they can't ignore you"
- [ ] Python basics - go through the whole list
- [ ] David Cournapeau - created Scikit-learn as a Google Summer of Code
- [ ] Archaeopteryx - dance AI software but for a different genre?
- [ ] using cakewalk's software to alter music?
- [ ] software that uses midi files as input and outputs an altered file?
- [ ] Mix music genres together to generate a song?
- [ ] Turns a picture into a paint by numbers, colored pencils, or just for sketching
- [ ] Algorithms:
- [ ] recommend books you normally don't read. Will then provide a sample. History + politics = science fiction?
- [ ] Carbon footprint comparison: shipping a garment vs. buying a new one from a store. How many times can it be shipped for it not be valid?
- [ ] grant finder for research universities? How much time can it save them? Fill out a majority of the forms as well?
- [ ] ML for networked thought
- [ ] Roam - reverse engineer
- [ ] Obsidian - Roam alternative
- [ ] ClickUp - use API to test
- [ ] Notion - no API but use Lotion's API?
- [ ] ML to delete duplicates on Google Photos
Source: [Jason Benn's article](Everything you need to become a self-taught Machine Learning Engineer)
Basic programing - make sure foundations are covered
- Make something similar for the Odin Project but for Data Science
Learn:
- Data Visualization: Tableu, Apache Spark
- Hadoop
- Databases: SQL
- Cloud environments: AWS, Azure, Kubernetes
- web service technologies: JSON
- Thredup scraper - remove favorites
Long term - 2021
Teach Yourself Computer Science - 2-3 years
Bradfield CS courses ~ 5 weeks long per course - paid version of TYCS
Dive into Machine Learning
ML curriculum (designed by Jason Benn)
fast.ai - part 1 and 2 (60 hours total)
Internship - Ideally 3 months long
Job - Mid to late 2021
- [ ] Should I learn Clojure?
Ideal Career
What field do I want to work in?
- sustainability
- efficiency
- community engagement
- local government
- education
- health
- food
- games?
- art
- music
What skills do I want?
- use various types of models and software
- lots of Python + an additional language
What am I looking for in a company?
- small size (less than 100?) - has a startup field
- casual setting - not meeting clients all the time, not formal wear
- focus on work/life balance
- located in a cool location - city or near lots of nature (mountains or beach)
- pro-bono work - on the side or as main work
- accelerated learning especially from peers
Done
August
Week 2-4: Thredup Database
- created a kanban board - all todos have migrated to the github project page
- Updated README file on Github for latest todo
Week 1: Thredup Database
- Thredup project
- clean-up Obsidian notes
- create a copy of a Github folder that syncs md file with Obsidian's file (don't want to keep copies at two locations
- push to Github (code + md notes)
- Anki
- Git cards - push, pull, etc.
- Website
- fix site
- ideas: menu, posts, layout
- Org-Roam
- Test out 2nd Org-Roam installation on separate conda environment
- Guide to Installation: complete
- Linux
- Anki
- conda env for thredup project
- update migration to Linux page
July
Week 5: Thredup Database
- Thredup up project
- finish all functions
- rest of the functions to be used later - also worked on them
- Dual Boot Laptop - oh boy!
- Figure out environments: conda and homebrew
- Experiment with installations with homebrew, then conda on other Linux distro - didn't use conda
Week 4: Thredup database
- Thredup Webscrape:
- links
- combine links with href header: thredup.com
- split urls for different clothing categories
- realized: cannot scrape for categories within "all petite" items
- multiple functions
- links
- DS setups + those who do both DS + side projects
Week 3: Debug + Environments
- Work on Thredup Page
- ideas: entire python library + build database using all petite items
- web scrapper for pulling non-polyester items stopped working
- web scrapper for pulling 100% linen items - use database instead
- web scrapper for building a database start
- pull links for the main pages - won't work with bs4, due to website layout. Not all href's are able to be pulled + random hrefs for products get pulled. Need to you XML formatting
- yank all hrefs/page - not anymore
- selenium to move to the next pages - not anymore
- pull links for the main pages - won't work with bs4, due to website layout. Not all href's are able to be pulled + random hrefs for products get pulled. Need to you XML formatting
- Get Doom Emacs
- Purchase computer accessories for working at home
- Acer One laptop:
- attempt clean-up
- wipe out system
Week 2: Anki + Feynman technique
- Fix thredup code
- move over any "Spotify Project" related files from host to guest OS
- move over simplified version of Digital Ocean files
- debugging
- Digital Ocean:
- sort through all files
- organize
- download
- delete - will cancel credit card
- LinkedIn - update all work description
- Feynman | DS concepts for Classification:
- AUC curve
- Feynman | DS concepts for Regression
- create a map for all concepts so far + tangents to see what directions I've been going in
- start Thred-up ML project - 1 hour
- start community involvement project
- Anki: map, filter, reduce and lambda:
- definition
- syntax
- example
Week 1 (3 days): Anki + Feynman
- Anki | python code in DS | 20 cards
- Feynman | DS concepts for Classification:
- Confusion Matrix
- Accuracy
- Precision & Recall
- F1 Score
- Harmonic Mean (from F1 score)
- F$\beta$-score (from F1 score)
- Sensitivity & Specificity
- ROC Curve
June
Week 5 - 2 days:
- Ultralearning review + update (mid year evaluation)
- Notes | scikit learn
Week 4:
- download RESULTER extension for google search shortcuts
- increase storage again - new way for Ubuntu 20.04
- reinstall Ubuntu after expanding hard drive mistake
- VS code: basic debugging
- VS code won't run Python - how to enable?
- access shared folder between host and guest systems
- Ideas vomit for open source projects: education (The Prize), government, businesses (Amazon → buying locally) Open Source Projects
- Will Athens be using Emacs? - nope
- change default environment to
DJ-set
- don't think you can
Week 3: Set up local machine - Ubuntu setup + Remixatron
- delete XUbuntu
- Install Ubuntu
- Install VirtualBox Guest Additions - automatically adjust resolution (to match host) + shared clipboard and drag and drop between host and guest systems
- Slow Ubuntu - wrong ISO mounted, reinstalled just incase
- Install miniconda + conda onto Virtual Box
- create environment
- Organize environments + GitHub folders problem arose from VS Code not being able to find pydub (installed in a different folder using conda than where pip was installing)
- no longer use Spotify's API
- run Remixatron on command line - won't run because of pygame
- embed client ID into jupyter notebook
- audio analysis (audio from youtube) using Spotify's API
- identify beat breakdown for Daechiwita + DNA. Didn't work since I was downloading music from YouTube and using Spotify's analysis on it
- switch from jupyter lab to VS Code VS code can run notebooks!
- widget for jupyter notebook to play 2 audio samples (downloaded from youtube?)
Week 2: Research into 'Daechwita —> DNA' project + VMs (using Linux fully)
- created separate page for the 'Daechwita —> DNA' project
- beats and tempos match! extracted from dict in analysis
- most code uses javascript with html + css (makes sense) with the spotify API
- discovered lots of resources, for python as well
- The Autocanonizer source: youtube
- Spotify audio analysis source: spotify track plays locally on desktop while song bar progresses on the web browser
- EternalJukebox - better README notes + is more updated than autocanonizer
- Youtube-dl - command-line program to download videos from YouTube.com and a few more sites
- Discovered Remixatron on AlternativeTo
- forced myself to learn git to download and play around on the command line
- Virtual box vs. SSH?
- Remixatron is giving array allocation errors. Code it myself!
Week 1: Led Zeppelin Project
- Led Zeppelin project - documentation
- Led Zeppelin project - define objective
- pull + sort albums from given artist
May
Week 4: Blog posts + interview questions
- Interview questions - 1 hour per day
- Data Science assignment
- How to solidify Python?
Week 3: Job applications
- redirect URL to notion or redirect from WordPress site itself
- Create word/pdf resume
- search for jobs
- cover letters
- job applications
Week 2: Online presence
- Website
- Resume
- Portfolio
- Completed Complete Machine Learning and Data Science: Zero to Mastery on Udemy.
- Time: 1 month to complete
- Also understand Python fundamentals better
- organize DS/ML documentation - majority
Week 1: Finish Python section of course
- Python videos - basically finishes up the whole course
April - back home in Quarantine
- Worked on Machine Learning and Data Science: Zero to Mastery on Udemy.
- Learned Github fundamentals - create respositories, push, pull
- Edited projects + updated profile
- Updated LinkedIn
- started working on website and resume
- organized all notes
March - Sri Lanka
- Discovered new resources on 3/25/20:
- Titanic data - completed seaborn (graphing + plotting) + SciKit (machine learning model, random forests)
- Looked at “titanic-start-here-a-gentle-introduction” on kaggle and didn’t understand the statistics (Logistic Regression)
- Discovered on YouTube: Daniel Bourke’s code with me and codebasics’ machine learning playlist (learned about logistic regression)
February - Bangalore
- Statistics for Data Science | Probability and Statistics - for overview/introduction
- Completed Probability and Statistics for Business and Data Science on Udemy.
- Time: 1 week to complete
- Pull Spotify features + analysis from Spotify's API using spotipy
- Thredup webscrapping python code
- Real Estate webscrapping python code
2019
- Python - Automate the boring stuff (first half of Udemy course)
- The Odin Project - started it off
- Data 36 - SQL practice
- Codewars - python + SQL practice
- SQL - interview questions
NOTES
Original Plan in Google Drive: