In the previous post, we learned how model agnostic meta learning (MAML) is pretty good at finding the right initial parameter. Now we will see how can we use MAML in the supervised learning setting? Before going ahead, let us quickly define our loss functions. Loss function can be any function according to the task we are performing.

If we are performing regression, then we can use our loss function as a mean squared error:

**L_{T_i}(f_{\theta}) = \sum_{x_j,y_j \sim T_i} || f_{\theta}(x_i) - y_i ||^2_2 **

If it is a classification task then we can use loss function such as cross entropy loss:

** L_{T_i}(f_{\theta}) = \sum_{x_j,y_j \sim T_i} y_j log f_{\theta}(x_j) + (1-y_j) log (1-f_{\theta}(x_j)) **

Now let us see step by step, how exactly MAML is used in supervised learning.

* ***Step**** 1:**

Let us say we have a model f parameterized by a parameter \theta and we have a distribution over tasks p(T) . First, we randomly initialize the model parameter \theta .

**Step 2: **

Now, we sample some batch of tasks T_i from a distribution of tasks i.e . T_i \sim p(T) Let us say we have sampled 3 tasks then T = {T_1, T_2, T_3 }

**Step 3: (inner loop) **

For each task (T_i) in tasks (T), we sample k data points and prepare our train and test dataset. i.e:

D_i^{train} = { (x_1,y_1),(x_2,y_2)…..(x_k,y_k) }

D_i^{test}= { (x_1,y_1),(x_2,y_2)…..(x_k,y_k) }

Wait. What are the train and test sets? So, we use the train set in the inner loop for finding the optimal parameters \theta_i' and test in the outer loop for finding the optimal parameter \theta. So our dataset basically contains x features and y labels.

Now we use any supervised learning algorithm on D_i^{train} and calculate loss and minimize the loss using gradient descent and get the optimal parameters \theta'_i

\theta'_i = \theta - \alpha \nabla{\theta} L_{T_i}(f_{\theta})

So for each of the tasks, we sample k data points and minimize the loss on the train set D_i^{train} and get the optimal parameters \theta'_i. As we sampled three tasks we will have three optimal parameters { \theta_1', \theta_2', \theta_3' } .

**Step 4: (outer loop) **

Now, we perform meta optimization in the test set i.e here we try to minimize the loss in the test set D_i^{test} . We minimize the loss by calculating the gradient with respect to our optimal parameter \theta'_i calculated in the previous step and update our randomly initialized parameter \theta using our test set.

\theta = \theta - \beta \nabla_{\theta} \sum_{T_i \sim p(T)} L_{T_i} (f_{\theta'_i})

**Step 5:**

we repeat steps 2 to 5 for some n number of iterations.

The below figure gives us an overview of MAML in supervised learning,

**Got any questions? Shoot me in the comment section below. **