[Tensorflow 101] What does it mean to reduce axis?

i.e. reduce_sum, reduce_max, reduce_mean, etc.

In Tensorflow code, you may have seen “reduce_*” many times. When I first used tf.reduce_sum, I thought, if it’s a sum, just say sum! Why do you have to put the prefix “reduce_” in front of every command?

Soon I realized it was because every time you do a sum, max, mean, etc., it inherently reduces the dimension of the tensor. For example, if you have 5 different numbers [1,2,3,4,5] and if you sum them, you get a single number, 15.

Let’s look at some real life example in the tensorflow repository.

def _scale_l2(x, norm_length):
alpha = tf.reduce_max(tf.abs(x), (1, 2), keep_dims=True) + 1e-12
l2_norm = alpha * tf.sqrt(
tf.reduce_sum(tf.pow(x / alpha, 2), (1, 2), keep_dims=True) + 1e-6)
x_unit = x / l2_norm
return norm_length * x_unit

It took sometime for me to be able to quickly visualize which dimensions are getting reduced and see how dimensions in reduce_max, reduce_sum exactly behave. I decided to write this post because many people are actually suffering from juggling dimensions/axes but the tensorflow tutorial assumes you are already familiar with this way of thinking. Having a good command of tensor shape/dimensions/indexing will save you a lot of debugging later.

So, what does (1,2) mean in the above lines?

To understand how Tensorflow treats dimensions, first read my blog Numpy Sum Axis Intuition because axes in numpy, tensorflow, pytorch all behave in the same way.

TLDR: The way to understand the “axis” of numpy/Tensorflow is: it collapses the specified axis.

In deep learning models, the shape of the tensor is usually (batch_size, time_steps, dimensions).

Let’s say we have a (3,2,5) dimension tensor.

# Let's initialize the tensor.
In [3]: x = tf.constant([[[1,2,3,4,5], [4,5,6,7,8]],
[[8,8,8,8,8], [9,9,9,9,9]]])
In [9]: sess = tf.InteractiveSession()
# Let's see how it looks.
In [10]: x.eval()
array([[[ 1, 2, 3, 4, 5],
[ 4, 5, 6, 7, 8]],
[[ 2, 4, 6, 8, 10],
[ 3, 6, 9, 12, 15]],
[[ 8, 8, 8, 8, 8],
[ 9, 9, 9, 9, 9]]], dtype=int32)

[Warm up Q.] What would be the output of tf.reduce_max(x, 0)?

(First write it down on a paper, then see the result by running the command.)

tf.reduce_max(x, 0) kills the 0-th dimension. So ‘3’ in (3,2,5) will be gone.

In [11]: tf.reduce_max(x,0)
Out[11]: <tf.Tensor 'Max:0' shape=(2, 5) dtype=int32>
In [12]: tf.reduce_max(x,0).eval()
array([[ 8, 8, 8, 8, 10],
[ 9, 9, 9, 12, 15]], dtype=int32)

Same goes for the 1st dimension. Now tf.reduce_max(x,1) makes the 1st dimension (2) pop up. The dimension becomes (3,5).

In [14]: tf.reduce_max(x,1)
Out[14]: <tf.Tensor 'Max_3:0' shape=(3, 5) dtype=int32>
In [15]: tf.reduce_max(x,1).eval()
array([[ 4, 5, 6, 7, 8],
[ 3, 6, 9, 12, 15],
[ 9, 9, 9, 9, 9]], dtype=int32)

In real life, you will probably see tuple dimensions like (1,2) instead of just a single dimension. For example, tf.reduce_max(x,(1,2)) means you want to get the max for each batch (batch size is usually dimension 0). Again, the same rule goes here. We collapse dimension (1,2).

In [16]: tf.reduce_max(x,(1,2)).eval()
Out[16]: array([ 8, 15, 9], dtype=int32)
In [17]: tf.reduce_max(x,(1,2))
Out[17]: <tf.Tensor 'Max_11:0' shape=(3,) dtype=int32>

But this destroyed the original shape of the tensor. Let’s use keepdims=True.

In [22]: tf.reduce_max(x,(1,2), keepdims=True).eval()
array([[[ 8]],
[[15]], [[ 9]]], dtype=int32)In [23]: tf.reduce_max(x,(1,2), keepdims=True)
Out[23]: <tf.Tensor 'Max_10:0' shape=(3, 1, 1) dtype=int32>

It maintains our batch shape!

If you like my post, could you please clap? It gives me motivation to write more. :)

I’m an Engineering Manager at Scale AI and this is my notepad for Applied Math / CS / Deep Learning topics. Follow me on Twitter for more!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store