We created a .tsv file with 'Head' : 'Relation' : 'Tail' values and Pykeen library with TransE model and softplus loss with 100 epochs was used to train the model on the triplet sets. Since the dataset was quite small, the process was quick.
Once the PyKeen model was trained on the triplets, embeddings of both the entities ('head' & 'tail') and relation ('type'/'relation') we used the embeddings to plot Knowledge Graphs and further reduce their dimensions to 3 with PCA/ UMAP & t-SNE dim reduction techniques and save them for further investigation and comparisons with traditional vector embeddings.
Results and Comparison with traditional Vector Embeddings?
Okay, we have the text and image KGE! and the corresponding Traditional Vector Embeddings with Word2Vec and CLIP reduced to 3d space with PCA/UMAP/t-SNE. Click on the link for the .csv
The detailed result files can be found in the assets folder located at 3.KG_Embeddings/assets/results/reduced_embeddings
But here’s the intriguing part:
Are these vectors static, or do they adapt based on the dataset and context, especially when using different language models? To explore this, we went a step further by plotting these embeddings in TensorBoard, uncovering insights into how context influences vector representation.
The visualisation
We’ve taken a deep dive into visualizing text and image embeddings using KGE, along with traditional vector embeddings from Word2Vec and CLIP. By reducing these embeddings to 3D space through PCA, UMAP, and t-SNE, we aimed to see which ones truly capture the context of our specific dataset.
from pykeen.pipeline import pipeline
result = pipeline(
model='TransE', #choose different graph embedding techniuqes
loss="softplus",
training=training_triples_factory,
testing=testing_triples_factory,
model_kwargs=dict(embedding_dim=3), # Increase the embedding dimension
optimizer_kwargs=dict(lr=0.1), # Adjust the learning rate
training_kwargs=dict(num_epochs=100, use_tqdm_batch=False), # Increase the number of epochs
)
from pykeen.pipeline import pipeline
result = pipeline(
model='TransE', #choose different graph embedding techniuqes
loss="softplus",
training=training_triples_factory,
testing=testing_triples_factory,
model_kwargs=dict(embedding_dim=3), # Increase the embedding dimension
optimizer_kwargs=dict(lr=0.1), # Adjust the learning rate
training_kwargs=dict(num_epochs=100, use_tqdm_batch=False), # Increase the number of epochs
)