Relation Network in One-Shot Learning

Relation network is popularly used one shot learning algorithm. Relation network consists of two important functions: embedding function denoted by f_{\varphi} and the relation function denoted by g_{\phi}. The embedding function is used for extracting the features from the input. If our input is an image, then we can use a convolutional network as our embedding function which will give us the feature vectors/embeddings of an image. If our input is a text, then we can use LSTM networks for getting the embeddings of the text.


As we know in one shot learning, we will have only a single example per class, let us say our support set contains three classes with one example per each class. As shown in the below figure, we have support set containing three classes {lion, elephant, dog}

Relation Networks in One-Shot Learning

And let us say we have a query image x_j as shown in the below figure and we want to predict the class of this query image.

Relation Networks in One-Shot Learning

First, we take each image x_i from the support set and pass it to the embedding function f_{\varphi}(x_i) for extracting the features. Since our support set has images, we can use convolutional network as our embedding function for learning the embeddings. The embedding function will give us the feature vector of each of the data point in the support set. Similarly, we will learn the embeddings of our query image x_j by passing it to the embedding function f_{\varphi}(x_j).

So, once we have the feature vectors of the support set f_{\varphi}(x_i) and query set f_{\varphi}(x_j). We combine them using some operator Z. Here Z can be any combination operator, we use concatenation as an operator for combining the feature vectors of support and query set.

As shown in the below figure we will combine the feature vectors of the support set f_{\varphi}(x_i) and query set f_{\varphi}(x_j). But what is the use of combining like this? It will help us to understand how the feature vector of an image in the support set is related to the feature vector of a query image. Z(f_{\varphi}(x_i), f_{\varphi}(x_j))

In our example, it will help us to understand how the feature vector of a lion is related to the feature vector of a query image, how the feature vector of an elephant is related to the feature vector of query image and how the feature vector of dog is related to the feature vector of query image.

Relation Networks in One-Shot Learning

But how can we measure this relatedness? So that is why we use a relation function g_{\phi}. We pass this combined feature vectors to the relation function which will generate the relation score ranging from 0 to 1 representing the similarity between samples in the support set x_i and samples in the query set x_j.

The below equation shows how we compute relation score r_{ij} in relation network,

r_{ij} = g_{\phi} ( Z(f_{\varphi}(x_i), f_{\varphi}(x_j)))

where r_{ij} denotes the relation score representing similarity between each of the class in the support set and the query image. Since we have three classes in the support set and one image in the query set, we will have 3 scores indicating how all the three classes in the support set is similar to the query image.

The overall representation of relation network in one shot learning setting is shown in the below figure,

Relation Networks in One-Shot Learning
Relation Networks in One-Shot Learning

In the next section, we will learn how relation network is used in few shot and zero shot learning system.

Leave a Reply