In the previous posts, we’ve learned how MAML (model agnostic meta learning) helps us to find the optimal initial model parameter so that we can generalize it to many other related tasks. We’ve also seen how MAML is used in supervised and reinforcement learning settings. But how can we apply MAML in an unsupervised learning setting where we don’t have labels for our data points? So, we introduce a new algorithm called CACTUS short for Clustering to Automatically Generate Tasks for Unsupervised Model Agnostic Meta Learning.

Let’s say we have a dataset D containing unlabeled examples: D= \{ x_{1}, x_{2}, x_{3} \dots x_{n} \} . Now, what can we do with this dataset? How can we apply MAML over this dataset?

First, what do we need for training using MAML? We need a distribution over tasks and we train our model by sampling a batch of tasks and find the optimal model parameter. A task should contain a feature along with its label. But how can we generate a task from our unlabeled dataset?

Let’s see how can we generate tasks using CACTUS in the next section. Once we generate the tasks, we can plug them easily into the MAML algorithm and find the optimal model parameter.

## Task generation using CACTUs

Let’s say we have a dataset D containing unlabeled examples: D= \{ x_{1}, x_{2}, x_{3} \dots x_{n} \} . Now we need to create labels for our dataset. How can we do that? First, we learn the embeddings of each of the data points in our dataset using some embedding function. The embedding function can be any feature extractor. Say our input is an image, then we can use CNN as our embedding function for extracting an image feature vector.

After generating the embeddings for each of the data points, how can we find the labels for them? A naive and simple approach would be to partition our dataset D into some p partitions with some random hyperplanes and then we can treat each of these partitioned subsets of a dataset as a separate class.

But the problem with this method is that, since we’re using random hyperplanes, our classes may contain completely different embeddings and it also keeps the related embeddings in different classes. So, instead of using random hyperplanes to partition our dataset, we can use a clustering algorithm. We use k-means clustering as our clustering algorithm to partition our dataset. We run k-means clustering for several iterations and get the k clusters (partitions).

We can treat each of these clusters as a separate class. So, what’s next? How can we generate the task? Let’s say that, as a result of clustering, we have five clusters. We sample n clusters from these five clusters. Then, we sample r data points from each of the n clusters without replacement; this can be represented as \{ x_{r} \}_{n} .

After that, we sample a permutation of n one-hot task-specific labels, l_n , for assigning labels l_n for each of the n sampled clusters. So now we’ll have a data point, \{ x_{r} \}_{n} , and a label, l_n .

Finally, we can define our task T as T=\{ \left(x_{n, r}, l_{n}\right) | x_{n, r} \in \{ x_{r} \}_{n} \}