Watching Tokyo Olympics during this summer, especially Anna Kiesenhofer, the Austrian Mathematician who won the gold medal in cycling, made me think that the concept of training is not unique to pro-athletes but also applies well to other technicians, e.g. engineers, musicians, dancers, etc.

I’m a software engineer and I don’t think we consciously “train” ourselves. However, if you think about it, how we work as an engineer is similar to how athletes train. To build something great, something like Python, Docker, Linux, etc. or to invent Hadoop, CNN, GAN, etc., …

My team spoke very highly about this blog (and they’re also wondering if self-supervised learning could eliminate the need for labeling entirely) so I gave it a read. It was a very well-written, thorough overview of self-supervised learning. What stands out the most was it was written by Dr. Lecun, one of the people that I respect the most in this field. I imagine his schedule will be brutal but I appreciate that he still finds time to write. …

Camera Geometry and The Pinhole Model

Camera calibration or camera resectioning estimates the parameters of a pinhole camera model given photograph. Usually, the pinhole camera parameters are represented in a 3 × 4 matrix called the camera matrix. We use these parameters to estimate the actual size of an object or determine the location of the camera in the world.


Before we talk about camera calibration, first you need to understand how the pinhole camera works.

Why do I need to know about the pinhole camera?

Because it is the essence of how any camera works. The pinhole camera model explains the relationship between a point…

How to model a natural language interface for relational databases

The views expressed on this post are mine alone and do not reflect the views of my employer, Microsoft.

Text-to-SQL is a task to translate a user’s query spoken in natural language into SQL automatically. It is the project that I’m working on at Microsoft.

If this problem is solved, it’s going to be widely useful because the vast majority of data in our lives is stored in relational databases. In fact, Healthcare, financial services, and sales industries exclusively use the relational database. This means the industries that can’t afford to lose transactions solely use the relational database. (You can…

[No more confusion] How to find a p-value and ultimately reject the null hypothesis

If you read any scientific papers, e.g. medical, artificial intelligence, climate, political, etc., or any poll result, there is a term that almost always appears — the p-value.

But what exactly is a p-value? Why does it show up in all these contexts?

This table lists the symptoms and their p-values when you are infected with the novel coronavirus (COVID-19).

From one of the most cited COVID-19 papers — Clinical Characteristics of 138 Hospitalized Patients With 2019 Novel Coronavirus Infected Pneumonia in Wuhan, China

The only remarks about this table from the author were “Proportions for categorical variables were compared using the χ2 test. P values indicate differences between ICU (Intensive Care Unit) and non-ICU patients.”

Let’s say all doctors in the hospital fell…

Scope, Examples, and Careers

The (somewhat vague) term “Operations Research” was coined during World War I. The British military brought together a group of scientists to allocate insufficient resources — for example, food, medics, weapons, troops, etc. — in the most effective way possible to different military operations. So the term “operations” is from “military operations”. Successfully conducting military operations was a huge deal and Operations Research (OR) became its own academic discipline in universities in the 40s.

Wikipedia page of Operations Research

When you google “Operations Research”, you get a very long Wikipedia article, however, the explanation is a little bit all over the place and to be…

When to use Beta distribution

The Beta distribution is a probability distribution on probabilities. For example, we can use it to model the probabilities: the Click-Through Rate of your advertisement, the conversion rate of customers actually purchasing on your website, how likely readers will clap for your blog, how likely it is that Trump will win a second term, the 5-year survival chance for women with breast cancer, and so on.

Because the Beta distribution models a probability, its domain is bounded between 0 and 1.

1. Why does the PDF of Beta distribution look the way it does?

An excerpt from Wikipedia

What’s the intuition?

Let’s ignore the coefficient 1/B(α,β) for a moment and only look at the numerator x^(α-1) * (1-x)^(β-1), because 1/B(α,β)

With examples & proofs

1. What is Prior?

Prior probability is the probability of an event before we see the data.
In Bayesian Inference, the prior is our guess about the probability based on what we know now, before new data becomes available.

2. What is Conjugate Prior?

Conjugate prior just can not be understood without knowing Bayesian inference.

For the rest of the blog, I’ll assume you know the concepts of prior, sampling and posterior.

Conjugate prior in essence

For some likelihood functions, if you choose a certain prior, the posterior ends up being in the same distribution as the prior. Such a prior then is called a Conjugate Prior.

It is always best understood through…

with Python Code

Why did someone have to invent the Bayesian Inference?

In one sentence: to update the probability as we gather more data.

The core of Bayesian Inference is to combine two different distributions (likelihood and prior) into one “smarter” distribution (posterior). Posterior is “smarter” in the sense that the classic maximum likelihood estimation (MLE) doesn’t take into account a prior. Once we calculate the posterior, we use it to find the “best” parameters and the “best” is in terms of maximizing the posterior probability, given the data. This process is called Maximum A Posteriori (MAP). …

Its properties, proofs & graphs

Why should I care?

Many probability distributions are defined by using the gamma function — such as Gamma distribution, Beta distribution, Dirichlet distribution, Chi-squared distribution, and Student’s t-distribution, etc.
For data scientists, machine learning engineers, researchers, the Gamma function is probably one of the most widely used functions because it is employed in many distributions. These distributions are then used for Bayesian inference, stochastic processes (such as queueing models), generative statistical models (such as Latent Dirichlet Allocation), and variational inference. …

Aerin Kim

I’m an Engineering Manager at Scale AI and this is my notepad for Applied Math / CS / Deep Learning topics. Follow me on Twitter for more!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store