## TensorFlow basics TensorFlow 2 went under a massive redesign to make the API more accessible and easier to use. If you are familiar with numpy you will find yourself right at home when using TensorFlow 2. Unlike TensorFlow 1 which was purely symbolic, TensorFlow 2 hides its symbolic nature behind the hood to look like any other imperative library like NumPy. It's important to note the change is mostly an interface change, and TensorFlow 2 is still able to take advantage of its symbolic machinery to do everything that TensorFlow 1.x can do (e.g. automatic-differentiation and massively parallel computation on TPUs/GPUs). Let's start with a simple example, we want to multiply two random matrices. First we look at an implementation done in NumPy: ```python import numpy as np x = np.random.normal(size=[10, 10]) y = np.random.normal(size=[10, 10]) z = np.dot(x, y) print(z) ``` Now we perform the exact same computation this time in TensorFlow 2.0: ```python import tensorflow as tf x = tf.random.normal([10, 10]) y = tf.random.normal([10, 10]) z = tf.matmul(x, y) print(z) ``` Similar to NumPy TensorFlow 2 also immediately performs the computation and produces the result. The only difference is that TensorFlow uses tf.Tensor type to store the results which can be easily converted to NumPy, by calling tf.Tensor.numpy() member function: ```python print(z.numpy()) ``` To understand how powerful symbolic computation can be let's have a look at another example. Assume that we have samples from a curve (say f(x) = 5x^2 + 3) and we want to estimate f(x) based on these samples. We define a parametric function g(x, w) = w0 x^2 + w1 x + w2, which is a function of the input x and latent parameters w, our goal is then to find the latent parameters such that g(x, w) ≈ f(x). This can be done by minimizing the following loss function: L(w) = &sum; (f(x) - g(x, w))^2. Although there's a closed form solution for this simple problem, we opt to use a more general approach that can be applied to any arbitrary differentiable function, and that is using stochastic gradient descent. We simply compute the average gradient of L(w) with respect to w over a set of sample points and move in the opposite direction. Here's how it can be done in TensorFlow: ```python import numpy as np import tensorflow as tf # Assuming we know that the desired function is a polynomial of 2nd degree, we # allocate a vector of size 3 to hold the coefficients and initialize it with # random noise. w = tf.Variable(tf.random.normal([3, 1])) # We use the Adam optimizer with learning rate set to 0.1 to minimize the loss. opt = tf.optimizers.Adam(0.1) def model(x): # We define yhat to be our estimate of y. f = tf.stack([tf.square(x), x, tf.ones_like(x)], 1) yhat = tf.squeeze(tf.matmul(f, w), 1) return yhat def compute_loss(y, yhat): # The loss is defined to be the l2 distance between our estimate of y and its # true value. We also added a shrinkage term, to ensure the resulting weights # would be small. loss = tf.nn.l2_loss(yhat - y) + 0.1 * tf.nn.l2_loss(w) return loss def generate_data(): # Generate some training data based on the true function x = np.random.uniform(-10.0, 10.0, size=100).astype(np.float32) y = 5 * np.square(x) + 3 return x, y def train_step(): x, y = generate_data() def _loss_fn(): yhat = model(x) loss = compute_loss(y, yhat) return loss opt.minimize(_loss_fn, [w]) for _ in range(1000): train_step() print(w.numpy()) ``` By running this piece of code you should see a result close to this: ```python [4.9924135, 0.00040895029, 3.4504161] ``` Which is a relatively close approximation to our parameters. Note that in the above code we are running Tensorflow in imperative mode (i.e. operations get instantly executed), which is not very efficient. TensorFlow 2.0 can also turn a given piece of python code into a graph which can then optimized and efficiently parallelized on GPUs and TPUs. To get all those benefits we simply need to decorate the train_step function with tf.function decorator: ```python @tf.function def train_step(): x, y = generate_data() def _loss_fn(): yhat = model(x) loss = compute_loss(y, yhat) return loss opt.minimize(_loss_fn, [w]) ``` What's cool about tf.function is that it's also able to convert basic python statements like while, for and if into native TensorFlow functions. We will get to that later. This is just tip of the iceberg for what TensorFlow can do. Many problems such as optimizing large neural networks with millions of parameters can be implemented efficiently in TensorFlow in just a few lines of code. TensorFlow takes care of scaling across multiple devices, and threads, and supports a variety of platforms.