Course Review: Probability - The Science of Uncertainty and Data MITx (6.431x)

After finishing Probability and Statistics by Stanford Online, I wanted to improve my knowledge of statistics further. The logical choice was to take Probability - The Science of Uncertainty and Data MITx (6.431x) which was starting at the time. While there is a MIT OCW version of the course, I decided to use the edX version, as the videos are shorter, it is more structured, there are tight deadlines instead of self study and as the course was starting, there were many other students participating at the same time. Especially the deadlines helped to motivate me, as there were coursework due every week that was graded and that had to be completed in time to count towards your overall score.

Overview

This course builds foundational knowledge of data science with this introduction to probabilistic models, including random processes and the basic elements of statistical inference. It is a introductory course into probability theory. The following is copied from the course description:

What you’ll learn

The world is full of uncertainty: accidents, storms, unruly financial markets, noisy communications. The world is also full of data. Probabilistic modeling and the related field of statistical inference are the keys to analyzing data and making scientifically sound predictions.

Probabilistic models use the language of mathematics. But instead of relying on the traditional “theorem-proof” format, we develop the material in an intuitive – but still rigorous and mathematically-precise – manner. Furthermore, while the applications are multiple and evident, we emphasize the basic concepts and methodologies that are universally applicable.

The course covers all of the basic probability concepts, including:

  • multiple discrete or continuous random variables, expectations, and conditional distributions
  • laws of large numbers
  • the main tools of Bayesian inference methods
  • an introduction to random processes (Poisson processes and Markov chains)

Syllabus

  • Unit 1: Probability models and axioms
    • Probability models and axioms
    • Mathematical background: Sets; sequences, limits, and series; (un)countable sets.
  • Unit 2: Conditioning and independence
    • Conditioning and Bayes’ rule
    • Independence
  • Unit 3: Counting
    • Counting
  • Unit 4: Discrete random variables
    • Probability mass functions and expectations
    • Variance; Conditioning on an event; Multiple random variables
    • Conditioning on a random variable; Independence of random variables
  • Unit 5: Continuous random variables
    • Probability density functions
    • Conditioning on an event; Multiple random variables
    • Conditioning on a random variable; Independence; Bayes’ rule
  • Unit 6: Further topics on random variables
    • Derived distributions
    • Sums of independent random variables; Covariance and correlation
    • Conditional expectation and variance revisited; Sum of a random number of independent random variables
  • Unit 7: Bayesian inference
    • Introduction to Bayesian inference
    • Linear models with normal noise
    • Least mean squares (LMS) estimation
    • Linear least mean squares (LLMS) estimation
  • Unit 8: Limit theorems and classical statistics
    • Inequalities, convergence, and the Weak Law of Large Numbers
    • The Central Limit Theorem (CLT)
    • An introduction to classical
  • Unit 9: Bernoulli and Poisson processes
    • The Bernoulli process
    • The Poisson process
    • More on the Poisson process
  • Unit 10 (Optional): Markov chains
    • Finite-state Markov chains
    • Steady-state behavior of Markov chains
    • Absorption probabilities and expected time to absorption

Format

The course is targeted at students that hear probability for the first time. It assumes next to no background in probability theory, but it requires a background in algebra and analysis. The schedule and setup is similar to a college course. The course material is organized along units, each unit containing between one and three lecture sequences. There is one unit due per two weeks. Each lecture sequence is of a single topic with several videos of around 10 minutes length each. Between videos, there are graded quizzes that can be very easy to quite hard. These should be completed before going to the next video. After the lessons for a week are completed, there are videos for solved exercises or additional theoretical background. Finally, there is homework of around 6 exercises that is graded. These are quite difficult and require a substantial time investment solve.

You are given two weeks to complete each week’s exercises. The videos amount to around 200 minutes per week MIT students who take the corresponding residential class typically report an average of 11-12 hours spent each week, including lectures, recitations, readings, homework, and exams. I watched the videos on 1.5x speed and think that I needed around 5-10 hours per week to watch the videos, to quizzes and exercises. There are two exams, one mid term and one final. The final exam is only available if you paid. I took the mid term exam and it was really difficult and while I reached a “passing” score, I did not score well overall. One can earn a certificate if above a certain score when paying, but I did the free course and just did the exercises to test myself.

Conclusion

I heard the basics of probability theory several times during school and university, but this course went much more in depth. It started familiar and got difficult pretty quickly. Especially enjoyable were the proofs of almost anything, the graphical explanations and the bite-sized videos which made it easy to follow and find the time. Everything has solutions if you did not manage to solve a quiz or exercise.

Lection 7 about Bayesian inference was really hard and many students complained about it. It was said that one needs to study the accompanying course book before watching the videos as they do not explain the material very well. Even though I took courses on Statistical Machine Learning before, I really struggled with the related quizzes and exercises of that unit. In the future, I will revisit it and see whether the explanation in the course was the issue or me.

Lection 9 and 10 were mostly new to me and very interesting. I always wondered about stochastic processes and after finishing the course, I now have a good basic understanding about them I liked the Markov chain, especially as they are related a bit to classical Reinforcement Learning. Overall, I really enjoyed the course and already feel its effects on me, especially when reading papers that use probability theory.

Overall, I am very happy to have taken the course; I find it is a good foundation and I will continue to build on it. The next course I want to take is Fundamentals of Statistics (MITx - 18.6501x) which is scheduled to start in Jan 24, 2022. It was offered at the same time as this course, but taking two courses at the same time was not recommended by the lecturers and not possible as it would take too much time per week from me. So I look forward to taking it in January! The next instance of Probability - The Science of Uncertainty and Data MITx (6.431x) seems also to be due in January, around the same time.

PhD Student Natural Language Processing

I am a computer science PhD student and Chinese learner.