# Knowledge Graph Embeddings

## Yes, now that we've Triplets what about Embeddings?

Triplet extraction with REBEL set up a base to create embeddings of both text and image from the  COYO Subset dataset- <https://www.kaggle.com/datasets/anantjain1223/coyo-1k-reduced>.&#x20;

<figure><img src="https://3558521670-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8mI5ieugJlfU4ng7eyP9%2Fuploads%2F745sRwoGQsAKDzTlShRl%2Fimage.png?alt=media&#x26;token=4400aafc-6c88-459c-98f5-2e65ebaf5445" alt="" width="225"><figcaption><p>KG to KGE!</p></figcaption></figure>

## Image Knowledge Graph Embeddings

The extracted triplets using REBEL (<https://huggingface.co/Babelscape/rebel-large>) for image urls were saved as .csv files at <https://www.kaggle.com/datasets/agampy/triplets-kg> and used for model training for embeddings.

<figure><img src="https://3558521670-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8mI5ieugJlfU4ng7eyP9%2Fuploads%2FU4DvJyLa1e9HJEFrek0K%2Fimage.png?alt=media&#x26;token=d789aebd-e7cd-4922-833c-1c4309c3ea3b" alt="" width="204"><figcaption><p>Snippet of url_triplets1k.csv</p></figcaption></figure>

#### Using PyKeen -- Python library for Knowledge Embeddings <https://github.com/pykeen/pykeen>

1. TransE for embeedings used
2. Softplus Loss
3. epochs=100

<figure><img src="https://3558521670-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8mI5ieugJlfU4ng7eyP9%2Fuploads%2FOlCIsbxGvWFINBbxFXeJ%2Fimage.png?alt=media&#x26;token=a21d3b49-e98e-445f-bbb7-fab514965d69" alt="" width="563"><figcaption><p>TransE used for embedding of triplets</p></figcaption></figure>

We created a .tsv file with <mark style="color:blue;">`'Head' : 'Relation' : 'Tail'`</mark>  values and Pykeen library with TransE model and softplus loss with 100 epochs was used to train the model on the triplet sets. Since the dataset was quite small, the process was quick.

```python
from pykeen.pipeline import pipeline

result = pipeline(
    model='TransE', #choose different graph embedding techniuqes
    loss="softplus",
    training=training_triples_factory,
    testing=testing_triples_factory,
    model_kwargs=dict(embedding_dim=3),  # Increase the embedding dimension
    optimizer_kwargs=dict(lr=0.1),  # Adjust the learning rate
    training_kwargs=dict(num_epochs=100, use_tqdm_batch=False),  # Increase the number of epochs
)
```

Once the PyKeen model was trained on the triplets, embeddings of both the entities ('head' & 'tail') and relation ('type'/'relation') we used the embeddings to plot Knowledge Graphs and further reduce their dimensions to 3 with PCA/ UMAP & t-SNE dim reduction techniques and save them for further investigation and comparisons with traditional vector embeddings.

```python
entity_embeddings = model.entity_representations[0](indices=None).cpu().detach().numpy()
entity_pca = pca.fit_transform(entity_embeddings)
relation_embeddings = model.relation_representations[0](indices=None).cpu().detach().numpy()
relation_pca = pca.transform(relation_embeddings)
```

For more insights to the code notebook, refer to <https://github.com/dsgiitr/kge-clip/blob/main/3.KG_Embeddings/src/pykeen_KGE.ipynb>

## Text Knowledge Graph Embeddings

The triplets were extracted with Babelscape's REBEL-large model and saved as .csv file with <mark style="color:blue;">`"text" :"triplet"`</mark>

The text triplets dataset can be found here : <https://www.kaggle.com/datasets/agampy/text-triplets1k> was used for training PyKeen model.  &#x20;

```python
from pykeen.pipeline import pipeline

result = pipeline(
    model='TransE', #choose different graph embedding techniuqes
    loss="softplus",
    training=training_triples_factory,
    testing=testing_triples_factory,
    model_kwargs=dict(embedding_dim=3),  # Increase the embedding dimension
    optimizer_kwargs=dict(lr=0.1),  # Adjust the learning rate
    training_kwargs=dict(num_epochs=100, use_tqdm_batch=False),  # Increase the number of epochs
)
```

<figure><img src="https://3558521670-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8mI5ieugJlfU4ng7eyP9%2Fuploads%2FYTVZDWLrbAHw7T7ViJr8%2Fnewplot.png?alt=media&#x26;token=01ee042f-0564-486e-b186-9a27e4f55862" alt="" width="375"><figcaption><p>Knowledge graph of entity and relation embeddings.</p></figcaption></figure>

The trained model is then used to create embeddings of the triplets which can be found at <https://github.com/dsgiitr/kge-clip/blob/main/3.KG_Embeddings/src/pykeen_KGE_text.ipynb>.

## Results and Comparison with traditional Vector Embeddings?

Okay, we have the text and image KGE! and the corresponding Traditional Vector Embeddings with Word2Vec and CLIP reduced to **3d space with PCA/UMAP/t-SNE.** Click on the link for the .csv

| [PCA\_image](https://github.com/dsgiitr/kge-clip/blob/main/3.KG_Embeddings/assets/results/reduced_embeddings/pca_image_kgembeddings.csv) | [UMAP\_image](https://github.com/dsgiitr/kge-clip/blob/main/3.KG_Embeddings/assets/results/reduced_embeddings/umap_image_kgembeddings.csv) | [t-SNE \_image](https://agam-pandey.gitbook.io/knowledge-graph-embedding-or-dsg-iitr/upgrading-from-vectors-to-graphs-knowledge-graph-embeddings-and-graph-rag.) |
| ---------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [PCA\_text](https://github.com/dsgiitr/kge-clip/blob/main/3.KG_Embeddings/assets/results/reduced_embeddings/pca_text_kgembeddings.csv)   | [UMAP\_text](https://github.com/dsgiitr/kge-clip/blob/main/3.KG_Embeddings/assets/results/reduced_embeddings/umap_text_kgembeddings.csv)   | [t-SNE\_text](https://github.com/dsgiitr/kge-clip/blob/main/3.KG_Embeddings/assets/results/reduced_embeddings/tsne_text_kgembeddings.csv)                        |

> The detailed result files can be found in the assets folder located at *3.KG\_Embeddings/assets/results/reduced\_embeddings*

### <mark style="color:blue;">**But here’s the intriguing part:**</mark>&#x20;

Are these vectors static, or do they adapt based on the dataset and context, especially when using different language models? To explore this, we went a step further by plotting these embeddings in TensorBoard, uncovering insights into how context influences vector representation.

The visualisation&#x20;

We’ve taken a deep dive into visualizing text and image embeddings using KGE, along with traditional vector embeddings from Word2Vec and CLIP. By reducing these embeddings to 3D space through PCA, UMAP, and t-SNE, we aimed to see which ones truly capture the context of our specific dataset.
