Knowledge Graph Embeddings

Yes, now that we've Triplets what about Embeddings?

Triplet extraction with REBEL set up a base to create embeddings of both text and image from the COYO Subset dataset- https://www.kaggle.com/datasets/anantjain1223/coyo-1k-reduced.

Image Knowledge Graph Embeddings

The extracted triplets using REBEL (https://huggingface.co/Babelscape/rebel-large) for image urls were saved as .csv files at https://www.kaggle.com/datasets/agampy/triplets-kg and used for model training for embeddings.

Using PyKeen -- Python library for Knowledge Embeddings https://github.com/pykeen/pykeen

TransE for embeedings used
Softplus Loss
epochs=100

We created a .tsv file with 'Head' : 'Relation' : 'Tail' values and Pykeen library with TransE model and softplus loss with 100 epochs was used to train the model on the triplet sets. Since the dataset was quite small, the process was quick.

from pykeen.pipeline import pipeline

result = pipeline(
    model='TransE', #choose different graph embedding techniuqes
    loss="softplus",
    training=training_triples_factory,
    testing=testing_triples_factory,
    model_kwargs=dict(embedding_dim=3),  # Increase the embedding dimension
    optimizer_kwargs=dict(lr=0.1),  # Adjust the learning rate
    training_kwargs=dict(num_epochs=100, use_tqdm_batch=False),  # Increase the number of epochs
)

Once the PyKeen model was trained on the triplets, embeddings of both the entities ('head' & 'tail') and relation ('type'/'relation') we used the embeddings to plot Knowledge Graphs and further reduce their dimensions to 3 with PCA/ UMAP & t-SNE dim reduction techniques and save them for further investigation and comparisons with traditional vector embeddings.

entity_embeddings = model.entity_representations[0](indices=None).cpu().detach().numpy()
entity_pca = pca.fit_transform(entity_embeddings)
relation_embeddings = model.relation_representations[0](indices=None).cpu().detach().numpy()
relation_pca = pca.transform(relation_embeddings)

For more insights to the code notebook, refer to https://github.com/dsgiitr/kge-clip/blob/main/3.KG_Embeddings/src/pykeen_KGE.ipynb

Text Knowledge Graph Embeddings

The triplets were extracted with Babelscape's REBEL-large model and saved as .csv file with "text" :"triplet"

The text triplets dataset can be found here : https://www.kaggle.com/datasets/agampy/text-triplets1k was used for training PyKeen model.

from pykeen.pipeline import pipeline

result = pipeline(
    model='TransE', #choose different graph embedding techniuqes
    loss="softplus",
    training=training_triples_factory,
    testing=testing_triples_factory,
    model_kwargs=dict(embedding_dim=3),  # Increase the embedding dimension
    optimizer_kwargs=dict(lr=0.1),  # Adjust the learning rate
    training_kwargs=dict(num_epochs=100, use_tqdm_batch=False),  # Increase the number of epochs
)

The trained model is then used to create embeddings of the triplets which can be found at https://github.com/dsgiitr/kge-clip/blob/main/3.KG_Embeddings/src/pykeen_KGE_text.ipynb.

Results and Comparison with traditional Vector Embeddings?

Okay, we have the text and image KGE! and the corresponding Traditional Vector Embeddings with Word2Vec and CLIP reduced to 3d space with PCA/UMAP/t-SNE. Click on the link for the .csv

The detailed result files can be found in the assets folder located at 3.KG_Embeddings/assets/results/reduced_embeddings

But here’s the intriguing part:

Are these vectors static, or do they adapt based on the dataset and context, especially when using different language models? To explore this, we went a step further by plotting these embeddings in TensorBoard, uncovering insights into how context influences vector representation.

The visualisation

We’ve taken a deep dive into visualizing text and image embeddings using KGE, along with traditional vector embeddings from Word2Vec and CLIP. By reducing these embeddings to 3D space through PCA, UMAP, and t-SNE, we aimed to see which ones truly capture the context of our specific dataset.

PreviousKnowledge Graph Generation NextKGE & TVE RAG for LLMs

Last updated 10 months ago