Do as much side-study of mathematical basics as possible
This aspect of learning cannot be over-emphasized — especially for non-CS graduates and IT engineers who are not in touch with rigorous mathematics for some years into their professional lives. I even wrote a medium article on what mathematics knowledge is necessary to have for machine learning and data science.
Mathematics necessary to learn/refresh for gaining foothold in data science/machine learning
For this I chose few courses from Cousera and edX. Few of them stand out in their depth and rigor. Those are,
- Statistical Thinking for Data Science and Analytics (Columbia Univ.):Foundation statistics course from Columbia University on their Data Science Executive certificate program on edX. Rigorous but drills down the concepts very well in a structured manner.
- Computational Probability and Inference (MIT):This is a hard one from MIT, be aware! It covers advanced topics like Bayesian models and Graphical models in unparalleled depth.
- Statistics with R Specialization(Duke Univ.): This is a 5-course (last one is a capstone project, you can ignore that) specialization from Duke University to enhance your statistics foundation along with hands-on programming exercise. Recommended for balanced difficulty level and rigor.
- LAFF: Linear Algebra — Foundations to Frontiers(UT Austin): This is an amazing course in linear algebra foundation (along with deep discussion about high-performance computing of linear algebra routines) that you must give a try. Offered by University of Texas, Austin on edX platform. Trust me when I say, after taking this course, you will never want to invert a matrix to solve a linear system of equations even if that is tempting and easy to understand but you will try to find a QR factorizationor Cholesky decomposition to reduce the computation complexity.
- Optimization Methods in Business Analytics(MIT): This is a course in optimization/operation research methods for business analytics from MIT. I signed up because this was the only highly-rated course on a good platform (edX) that I could find about linear and dynamic programming techniques. I believed that learning about those techniques could be immensely helpful as the optimization problem turns up in almost all machine learning algorithm.
Please note that I did not search and sign up for any calculus course as I was comfortable with the level of knowledge I could remember (from college days) and what I expected to be useful for any machine learning or data science study and practice. If you are rusty in that area, please search for a good one.
Machine Learning — various personalities make it a colorful affair
Somewhere among all these side-studies, I managed to complete the course that is considered as one of the pioneers of all MOOCs — Andrew Ng’s machine learning course on Coursera. I guess there are plenty of articles written about it already, and therefore, I will not waste any more of your time describing this course. Just take it, do all the homework and programming assignments, learn to think in terms of vectorized codes for all the major machine learning algorithms that you know of, and save the notes for ready reference for your future work.
Oh, by the way, if you want to brush up/ learn from scratch MATLAB (you will need to write MATLAB codes for this course, not R or Python), then you can check out this course: Introduction to Programming with MATLAB.
Now, I want to talk about personalities.
I took multiple machine learning courses and the aspect I enjoyed most was realizing how the treatment of the same fundamental subject becomes a function of the personality and worldview of different instructors :) This was a fascinating experience.
I am listing down the various machine learning MOOCs I signed up and covered…
- Machine Learning (Stanford Univ.): Andrew Ng’s widely known course. Talked about it in the paragraph above.
- Machine Learning Specialization (Univ. of Washington): This comes with a different flavor than Ng’s. Emily Fox and Carlos Guestrin present the concepts from a statistician’s and a practitioner’s perspective respectively. I could not install the Python package that Carlos’ company offers as a free license but this specialization is worth completing for its theory lectures alone. The proofs and discussion of some of the fundamental concepts like bias-variance trade-off, cost computation, and comparison of analytic vs. numerical approaches for cost function minimization, are more intuitively and carefully presented than even Prof. Ng’s course (and that’s saying something given the superb quality of Prof. Ng’s teaching).
- Machine Learning for Data Science and Analytics (Columbia Univ.): This course had a little unusual syllabus for a general machine learning course by devoting the full first half on conventional algorithms lectures. It covered essential sorting, searching, graph traversing, and scheduling algorithms. There is not much one-to-one discussion about how these algorithms are exactly used in the machine learning problems but studying about them gives you an idea about the traditional computer science knowledge necessary to appreciate how large-scale data science problems are tackled. Think O(n^3) whenever you are about to multiply to matrices or think O(nlog(n)) whenever you are sorting a list. You may not exclusively use this knowledge in your day-to-day job, but knowing about these nuts and bolts of computation process certainly broadens your worldview about the problem at hand.
- Data Science: Data to Insights (MIT xPro 6 weeks online course): This one is among the very few paid courses I have taken (I generally go Audit route for MOOCs). This is not available on public edX website although it uses the edX platform for delivering content. The 6-week course is well-structured and full of interesting content which opens up the wide world of data science and machine learning to the uninitiated. The case studies are very interesting but reasonably hard and time consuming to codify. Lectures are very engaging with the illustration of those case studies. My particular favorite module was the one about recommendation system. I literally started viewing the Netflix screen on my laptop in terms of adjacency matrix after taking this class!
- Neural Networks for Machine Learning (Univ. of Toronto): This is a somewhat underrated course on Coursera, even with the neural network pioneer Jeff Hinton as the instructor. I realize that Andrew Ng’s new Deep Learning specialization will directly compete with this course and I would not be surprised if Coursera removes this in near future. However, while it is there, a deep learning enthusiastic should sit through this one, even if just to gauge the pattern of the historical development of deep networks.
- Deep Learning Specialization (deeplearning.ai): This is the newest kid on the block but it stands of the very board shoulder of Andrew Ng, and therefore boasts of very strong legs :) I have finished the 2nd course and on to the 3rd now. Jury is still out there but definitely you should consider completing this series if you want to brush over the latest trends in deep learning. Even if the programming assignments look hard and you want to stay out of programming a deep network by hand (you can argue there are always excellent open-source packages like TensorFlow, Keras, Theranos, out there to take care of the nuts and bolts under the hood), it is imperative to have deep understanding of the essential concepts such as regularization, exploding gradient, hyperparameter tuning, batch normalization, etc. to effectively use those high-level deep learning frameworks.
Two umbrella data science MOOCs with R and Python
As we draw closer to the end of this long article, I wanted to list down two multi-course MOOCs I found interesting and useful to go along with the specific subject areas mentioned above.
- Data Science Specialization (John Hopkins Univ.): This one is a well-known 10-course specialization offered on Coursera. Not every course will appeal to every leaner. I personally completed only 5 of the 10. The key thing is the timing i.e. when to start this specialization. Often this comes up at the top of the Google result when one researches about MOOCs for data science and therefore this becomes the first MOOC for many new learners. Personally, I would have had problem getting the full value from this course if I had done that. The introductory Microsoft and Udemy courses on R and few statistics and linear algebra courses before this helped me immensely to extract the full benefit from these set of courses. As the specialization is instructed by professors from bio-statistics department of JHU, one gets an excellent treatment of two aspects of data science which are often under-represented in many curriculum— research study and design of experiment.
- Data Science Micromasters certificate program (UC San Diego): I have just enrolled and started the 1st of the 4 courses in this series/certificate program. I like the fact that this is similar in breadth and goals as the John Hopkins specialization, except it chooses Python as the working language for the hands-on portion. The structure and content seems well thought out covering basics of Python, Git, Jupyter all the way up to Big data processing with Apache Spark framework (statistics and machine learning courses thrown in the middle). The case studies and hands-on examples are drawn from real world application of data science such as wildfire modeling, cholera outbreak, or world development indicator analysis. One of the lead instructors is Ilkay Altintas, who has created amazing platform for helping wildfire dynamics prediction and is putting the fruits of data science research for pursuing societal good. I am sure my journey with this specialization will be an exciting and rewarding one. You are welcome to join the party!
Learning is pretty democratized — take advantage of it
With the advent of MOOCs, open-source programming platforms, collaboration tools, and virtually unlimited free cloud-based storage, learning is as democratized, ubiquitous, and universally accessible as it can get. If you are not a specialist on data science/machine learning but want to learn the subject, write some code for higher productivity at work, strive for a career enhancement, or just have some fun, now is the time to start learning. Few parting comments,
- You are a data scientist: Do not let any so-called expert demoralize you by saying something like “MOOCs are for kids, you won’t learn real data science like that”. The very fact that you are trying to data science by enrolling in a MOOC means two things: (a) you already deal with data in your professional life and (b) you want to learn scientific, structured manner of extracting maximum value from your data and generate intelligent questions around that data. That means you, my friend, are already a data scientist. If still not convinced, read this blog by Brandon Rohrer, one of the most admired and inspirational data scientists that I know of.
- You don’t have to spend a large sum for this learning: I know that I listed a lot of courses and they may look expensive to you. But, fortunately, most (if not all), can be enrolled into free of cost. edX courses are always free to enroll and they generally don’t have any restrictions in terms of course content i.e. you can view, execute, submit all the graded assignments (unlike Coursera, which let’s you watch all the videos but hides the graded material). If you think some certificate is worth showcasing on your resume, you can always pay for it in the middle of the course after you have completed some videos and judged the merit and utility.
- Practice, code, and build things to supplement your online learning: There is a real algorithm called ‘online learning’ in the context of machine learning. In this technique, instead of processing a full matrix of millions of data points, the algorithm works with the latest few data points and updates the prediction. You can work in this mode too. The halting problem/parking problem is always a fascinating one and it applies to learning too. We always wonder how much to study and assimilate before building things i.e. where to halt the learning and start implementing. Don’t hesitate, don’t procrastinate. Learn a concept and test it by simple coding. Work with the latest trick or technique you watched video about, don’t wait for achieving mastery over the entire topic. You will be amazed by how simple 20 lines of coding can give you solid practice (and make you sweat enough) on the most complex concept you learned watching that video.
- There is plenty of data out there: You will also be amazed how many rich sources of free data are out there on the web. Don’t go to Kaggle, try something different for fun. Try data.gov or United Nations data portal. Go to UCI machine learning repository. Feeling more adventurous? What about downloading data about various countries from CIAand try all the cool visualizations that you learned in the latest Matplotlib or ggplot2 lecture? If not anything else, download your own electricity usage data from your energy provider and analyze if you could save few bucks if you turned on the AC or dish washer at a different time.
The opinions expressed in this articles about various courses/instructors are entirely author’s own. If you have any questions or ideas to share, please contact the author at tirthajyoti[AT]gmail.com. Also you can check author’s GitHub repositories for other fun code snippets in Python, R, or MATLAB and machine learning resources. You can also follow me on LinkedIn.
Bio: Tirthajyoti Sarkar is a semiconductor technologist, machine learning/data science zealot, Ph.D. in EE, blogger and writer.
Original. Reposted with permission.
Pages: 1 2
CMU 10-806 Foundations of Machine Learning and Data Science, Fall 2015
Instructors: Nina Balcan and Avrim Blum
Mon/Wed 4:30-5:50, GHC 4303
Course description: This course will cover fundamental topics in Machine Learning and Data Science, including powerful algorithms with provable guarantees for making sense of and generalizing from large amounts of data. The course will start by providing a basic arsenal of useful statistical and computational tools, including generalization guarantees, core algorithmic methods, and fundamental analysis models. We will examine questions such as: Under what conditions can we hope to meaningfully generalize from limited data? How can we best combine different kinds of information such as labeled and unlabeled data, leverage multiple related learning tasks, or leverage multiple types of features? What can we prove about methods for summarizing and making sense of massive datasets, especially under limited memory? We will also examine other important constraints and resources in data science including privacy, communication, and taking advantage of limited interaction. In addressing these and related questions we will make connections to statistics, algorithms, linear algebra, complexity theory, information theory, optimization, game theory, and empirical machine learning research. [More info] [People and office hours]
Take-home finalYou can take the test in any 24-hour period you want up unil Fri Dec 18 (i.e., midnight Dec 18 is the latest hand-in date).
Here is the take-home final.
Projects [Project ideas]Project poster presentations will be Thursday December 17, 4-6pm in the GHC 7th floor Atrium. If you cannot make that time, please come talk with us.
Writeups due by Sunday December 20.
Lecture Notes & Handouts
- 09/09: Introduction. The consistency model.
See also Chapter 1 in the Kearns-Vazirani book.
- 09/14: The PAC model for passive supervised learning.
See also Chapter 1 in the Kearns-Vazirani book.
- 09/16: Effective number of hypotheses, VC-dimension, and Sauer's lemma.
See also Chapter 3 in the Kearns-Vazirani book.
- 09/21: Sample complexity results for infinite hypothesis spaces.
See also Chapter 3 in the Kearns-Vazirani book.
- 09/23: Sample complexity results for infinite hypothesis spaces (cont'd).
See also Chapter 3 in the Kearns-Vazirani book.
- 09/28: Sample complexity lower bounds for passive supervised learning.
See also Chapter 3.6 in the Kearns-Vazirani book.
- 09/30: Sample Complexity results for the agnostic case.
- 10/05: Generalization bounds based on Rademacher complexity.
See also Chapter 3 in the Mohri, Rostamizadeh, and Talwalkar book.
See also the survey Introduction to Statistical Learning Theory by O. Bousquet, S. Boucheron, and G. Lugosi. See also the survey Theory of Classification: A Survey of some recent advances by O. Bousquet, S. Boucheron, and G. Lugosi.
- 10/07: Computational hardness results.
See also Ch. 1.4, 1.5, and 6.1 in the Kearns-Vazirani book. More resources on NP-hardness: 1, 2.
- 10/12: Online learning and optimization I: mistake-bounds and combining expert advice.
Further readings: book chapter
- 10/14: Online learning and optimization II: ERM and Follow the Regularized Leader.
See also Shalev-Shwartz monograph
- 10/19: Online learning and optimization III: FTRL contd, and Follow the Perturbed Leader.
- 10/21: Online learning and optimization IV: FPL contd, and the multi-armed bandit setting.
- 10/26: Boosting: weak-learning, strong-learning, and adaboost.
See also Chapter 4 in the Kearns-Vazirani book and Chapter 6 in the Mohri-Rostamizadeh-Talwalkar book.
See also Rob Schapire's notes.
- 10/28: Learning and game theory.
- 11/02: Learning and game theory.
See also this book chapter
- 11/04: Streaming Algorithms: estimating frequency counts, the count-min sketch, begin distinctness counting.
See also Muthukrishnan lecture notes, Chakrabarti lecture notes
- 11/09: Streaming Algorithms II: distinctness counting, frequency moments.
- 11/11: The Johnson Lindenstrauss Lemma and tensor methods.
See also Moitra notes, Dasgupta notes
- 11/16: Foundations of Active Learning Intro slides and Lecture Notes.
See also the survey Two Faces of Active Learning by S. Dasgupta.
See also the survey Active Learning Survey by Balcan and Urner.
- 11/18: Disagreement Based Active learning.
See also the survey Theory of Disagreement-Based Active Learning by S. Hanneke.
- 11/23: Active Learning of Linear Separators and Slides.
- 11/30: Semi-Supervised Learning.
See also this survey article
- 12/02: Semi-Supervised Learning and brief discussion of multilayer networks.
- 12/07: Differential privacy and Statistical Query learning and Slides.
- 12/09: Distributed learning
See also: Jordan and Mitchell, Machine learning: Trends, Perspectives, and Prospects.