My team spoke very highly about this blog (and they’re also wondering if self-supervised learning could eliminate the need for labeling entirely) so I gave it a read. It was a very well-written, thorough overview of self-supervised learning. What stands out the most was it was written by Dr. Lecun, one of the people that I respect the most in this field. I imagine his schedule will be brutal but I appreciate that he still finds time to write. …

Camera calibration or camera resectioning **estimates the parameters of a pinhole camera model** given photograph. Usually, the pinhole camera parameters are represented in a 3 × 4 matrix called the camera matrix. We use these parameters to** estimate the actual size of an object** or **determine the location of the camera in the world**.

Before we talk about camera calibration, first you need to understand how the pinhole camera works.

Why do I need to know about the pinhole camera?

Because it is the essence of how any camera works. The pinhole camera model explains the relationship between a point…

*The views expressed on this post are mine alone and do not reflect the views of my employer, Microsoft.*

**Text-to-SQL** is a task to translate a user’s query spoken in natural language into SQL automatically. It is the project that I’m working on at Microsoft.

If this problem is solved, it’s going to be widely useful because the vast majority of data in our lives is stored in relational databases. In fact, **Healthcare**, **financial services, and sales industries **exclusively use the relational database. This means the industries that can’t afford to lose transactions solely use the relational database. (You can…

If you read any scientific papers, e.g. medical, artificial intelligence, climate, political, etc., or any poll result, there is a term that almost always appears — the p-value.

But what exactly is a p-value? Why does it show up in all these contexts?

This table lists the symptoms and their p-values when you are infected with the novel coronavirus (COVID-19).

The only remarks about this table from the author were *“Proportions for categorical variables were compared using the χ2 test. P values indicate differences between ICU (Intensive Care Unit) and non-ICU patients.”*

Let’s say all doctors in the hospital fell…

The (somewhat vague) term “Operations Research” was coined during World War I. The British military brought together a group of scientists to allocate insufficient resources — for example, food, medics, weapons, troops, etc. — in the most effective way possible to different military **operations**. So the term “*operations*” is from “*military operations*”. Successfully conducting military operations was a huge deal and Operations Research (OR) became its own academic discipline in universities in the 40s.

When you google “Operations Research”, you get a very long Wikipedia article, however, the explanation is a little bit all over the place and to be…

The Beta distribution is **a probability distribution on probabilities**. For example, we can use it to model the probabilities: the Click-Through Rate of your advertisement, the conversion rate of customers actually purchasing on your website, how likely readers will clap for your blog, how likely it is that Trump will win a second term, the 5-year survival chance for women with breast cancer, and so on.

Because the Beta distribution models a probability, its domain is bounded between **0 **and **1**.

**Let’s ignore** **the coefficient** **1/B(α,β) **for a moment and only look at the numerator** x^(α-1) * (1-x)^(β-1),** because **1/B(α,β)**…

Prior probability is **the probability of an event before we see the data**.

In Bayesian Inference, the prior is our guess about the probability based on what we know now, before new data becomes available.

Conjugate prior just can not be understood without knowing Bayesian inference.

For the rest of the blog, I’ll assume you know the concepts of prior, sampling and posterior.

**For some likelihood functions, if you choose a certain prior, **the posterior ends up being in the same distribution as the prior. Such a prior then is called a Conjugate Prior.

It is always best understood through…

In one sentence: to **update the probability** **as we gather more data.**

The core of Bayesian Inference is to combine two different distributions (likelihood and prior) into one “smarter” distribution (posterior). Posterior is **“smarter” in the sense that the classic maximum likelihood estimation (MLE) doesn’t take into account a prior.** Once we calculate the posterior, we use it to find the “best” parameters and the **“best” is in terms of maximizing the posterior** **probability**, given the data. This process is called Maximum A Posteriori (MAP). …

**Why should I care?**

**Many probability distributions are defined by using the gamma function** — such as Gamma distribution, Beta distribution, Dirichlet distribution, Chi-squared distribution, and Student’s t-distribution, etc.For data scientists, machine learning engineers, researchers, the Gamma function is probably

Before setting Gamma’s two parameters *α, β** *and plugging them into the formula, let’s pause for a moment and ask a few questions…

Why did we have to invent the Gamma distribution? (i.e., why does this distribution exist?)

When should Gamma distribution be used for modeling?

**Answer: To predict the wait time until future events.**

Hmmm ok, but I thought that’s what the exponential distribution is for.

Then,what’s the difference between exponential distribution and gamma distribution?

The exponential distribution predicts the wait time until the ***very first*** event. …

I’m an Engineering Manager at Scale AI and this is my notepad for Applied Math / CS / Deep Learning topics. Follow me on Twitter for more!