# Knowledge Graph Generation

## What is Graph Representation?

In data science and machine learning, graphs are a powerful tool for showing relationships between different entities. Unlike traditional methods that treat each data point as a separate item, graph representations capture the connections between them, offering a more complete and insightful view.

A graph consists of nodes (or vertices) and edges. Nodes represent entities such as people, objects, or concepts, while edges denote the relationships between these entities.&#x20;

<figure><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXebEwz4BeuCkGChVgWpXgocdt33L2xAhGw2IMkM0xB49n3SJBB3grglTqGuTl_AE_NP82BdtxKMuwo33PqQIG80jmYn7CFxIhw3pd-W3WifL1hGSFu_TCQSWKPxXtD9N38PkqmCkHHyiuyEVm9gtmtF3_c?key=wfpiEsxZ9fIJH4KqLyeDKA" alt="" width="375"><figcaption><p>Basic structure of a Knowledge Graph</p></figcaption></figure>

By leveraging graph representation, we can encode not only the individual properties of entities but also the rich, contextual relationships between them. This allows us to move beyond simplistic, point-in-space embeddings and instead model data as a network of interconnected entities.

<figure><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXewX7rC5vf9ytSqfV5jaaIFUGBlk_weu5sndQy1llyH1bU-ivgt52qRpFno1S3QnRqmyes8F1NwW5O7M88XQnSsPkJ0CZrz67UwjnubtEUZY9LuUk8Kc6nEbHmg3hW12o4UQthoATnMFO2CyJgwHVDAAZI?key=wfpiEsxZ9fIJH4KqLyeDKA" alt="" width="375"><figcaption><p>Visual of Graph Representation: A diagram showing a simple graph with labeled nodes (e.g., "Boots," "Sport shoes") connected by edges (e.g., "sub-class”).</p></figcaption></figure>

## How can Knowledge Graphs help Embeddings and LLMs?

Knowledge Graphs (KGs) extend the idea of graph representation by organizing information into a structured, relational format. KGs are particularly useful in capturing the semantic relationships between different entities, which is invaluable for tasks like information retrieval, recommendations, and language understanding.

When we apply Knowledge Graph embeddings, we transform the nodes and edges of a KG into a vector space, preserving the relational structure and semantic meaning. These embeddings can be used to enhance traditional machine learning models, including Large Language Models (LLMs).

<figure><img src="/files/ss9CUREXMKhnjHdQVwRH" alt=""><figcaption><p>Best of both the worlds - Query the question to LLM and get back precise answers to it powered by the knowledge graph engine.</p></figcaption></figure>

Benefits of Knowledge Graphs for Embeddings and LLMs:

1. <mark style="color:blue;">**Contextual Understanding**</mark>: KGs allow embeddings to be context-aware by preserving the relationships between entities. This leads to more accurate and meaningful representations, particularly when dealing with multi-modal data.
2. <mark style="color:blue;">**Enhanced Information Retrieval**</mark>: By incorporating KGs, embeddings can improve search results by understanding the relationships between query terms and the data. This is especially useful in domains like e-commerce, where users might search for products using various descriptions.
3. <mark style="color:blue;">**Improved Recommendations:**</mark> KGs can help in generating more relevant recommendations by understanding user preferences and the relationships between different items. For instance, if a user frequently interacts with content about renewable energy, the system might recommend related articles on solar power, energy storage, and environmental policy.
4. <mark style="color:blue;">**Better Large Language Models:**</mark> LLMs, when integrated with KG embeddings, can generate more contextually accurate and semantically rich responses. This is particularly important in applications like chatbots and virtual assistants, where understanding the user's intent and providing relevant answers is crucial.

## Methods used by us for Text KG

<mark style="color:blue;">**Dataset used**</mark> - The dataset includes 1000 subset rows of **COYO 700M text-image pair dataset**  which can be availed from kaggle - [coyo-1k-reduced](https://www.kaggle.com/datasets/anantjain1223/coyo-1k-reduced)

<mark style="color:blue;">**Triplet Extractions**</mark> - Relational triplets form the backbone of any knowledge graph. A triplet consists of three components: the subject (head), the predicate (relation), and the object (tail). We use the REBEL model that processes each caption in the dataset and generates these triplets. For each image caption, multiple triplets can be extracted, capturing various relationships between the entities described in the text.

REBEL is a seq2seq model based on BART that performs end-to-end relation extraction for more than 200 different relation types.The paper can be found [here](https://github.com/Babelscape/rebel/blob/main/docs/EMNLP_2021_REBEL__Camera_Ready_.pdf)

<figure><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXcmr0rd2fdzCwSOhlDTAyYLt0TiL7KWL9rB9mWVSgCB_w8a6RhRSIRv1B4EKzsBC-icCJ7aDVIh2BknjJ1cKIYbl0pMqveXO3Im-9pP2GkKswjY7q2tpT2CyVHB82JO4CFr1sBmZIZu7NFXw4RXko2gWGs?key=wfpiEsxZ9fIJH4KqLyeDKA" alt=""><figcaption><p>Example Triplet Extraction by REBEL </p></figcaption></figure>

<mark style="color:blue;">**Methods used for Visualization -**</mark>&#x20;

After extracting the triplets, the next step is to visualize the knowledge graph. Visualization helps in understanding the structure and relationships within the data. We used four different methods to visualize the KG:

<mark style="color:blue;">**a. Neo4j**</mark>

Neo4j is a graph database that excels at managing and querying large-scale knowledge graphs. By importing the extracted triplets into Neo4j, we could visually explore the connections between entities. Neo4j’s powerful query language, Cypher, allows for complex querying and analysis, making it an ideal choice for deeper insights into the graph structure.

<figure><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXeGlEcT8jvt4VbWXQp84QMi1WranBkPEYdzWvt98Bxoi2-_v-kH_G8W1FLhLNgvI8jBNzcJJjVTgEPwS3wl-7-rIqN_KSR_-IzZVJ0je9UsK98ImbUg0iAgLrC2nVmC1_JNjeGaE6QEbnROoeB-5JTKApJL?key=wfpiEsxZ9fIJH4KqLyeDKA" alt="" width="375"><figcaption><p>Interactive Neo4j Knowledge Graph visualizing the intricate relationships between entities, enabling dynamic exploration of data connections.</p></figcaption></figure>

<mark style="color:blue;">**b. NetworkX**</mark>

NetworkX is a Python library designed for the creation, manipulation, and study of complex networks. We utilized NetworkX to generate basic visual representations of the knowledge graph. This method is particularly useful for smaller datasets or when you need to quickly prototype and analyze the structure of your KG without the overhead of a database.

<figure><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfXXPHBgZRxo25gNdtyjKx1wAzVPhk6xxUdUScMoZzc3fEuVwFZKdrBdQ1Wjy6IFpRPnTyOJoEi7z0gxho0IFiUNDazUNbGTS-HWSa_nDQuHITRCV5yvFd6HGrzvelTqGcTgs7RR0G24q39ziBZvhMmoYQ?key=wfpiEsxZ9fIJH4KqLyeDKA" alt="" width="375"><figcaption><p>NetworkX-powered Knowledge Graph showcasing node-link structures</p></figcaption></figure>

<mark style="color:blue;">**c. Plotly**</mark>

Plotly is a versatile graphing library that supports interactive visualizations. For our KG, we used Plotly to create interactive, web-based visualizations that allow for dynamic exploration of the relationships between entities. The ability to zoom, pan, and interact with the graph nodes and edges provides an engaging way to explore the KG.

<figure><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXehMTzIU5Ha9Tn3K7LsOFa0GlFgF0wFpcEost3IrfHaIE-PR9TsXffMCrDqVnRxg8VpuGHG8lvAljdkOsNcgubZfaZ6WUZi949ihH7P3ay1jrJ_zeGGfePpGrUyfLpEo9weKp52KeiVGf2NOTncXhqm0Fw?key=wfpiEsxZ9fIJH4KqLyeDKA" alt=""><figcaption><p>Plotly Knowledge Graph illustrating key relationships and concepts among different nodes </p></figcaption></figure>

<mark style="color:blue;">**d. Graphviz**</mark>

Graphviz is a robust tool for rendering graphical representations of data structures. It excels at generating static images of graphs, which are useful for documentation and presentations. We used Graphviz to produce clear and concise visualizations of the KG, highlighting the most important relationships extracted from the dataset.

<figure><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXcul2ABHb7PYIBvWAkJ65ACGluyNGVbdtP99bNfwvm79TH7FJrZKPn85H3WiVk8uLceKn5bcA1_3LI7BLsTf8dXgRLuUqUf0Agg2vPCZ3lRS4ONZAl5qRqOFaxYWNx-CIC_hsV76OvvsT5vTzORcTPcR08X?key=wfpiEsxZ9fIJH4KqLyeDKA" alt=""><figcaption><p>Graphviz-rendered Knowledge Graph highlighting hierarchical data structures with precision and clarity in static visual representation.</p></figcaption></figure>

***

## Okay, what about the Image KG?

**We followed 2 approaches to create Image Scene Graphs**

1. **Vision Language Model (VLMs)**
2. **Relationformer (refer to the paper-** [**https://arxiv.org/abs/2203.10202**](https://arxiv.org/abs/2203.10202)**)**

### How did we go about using VLMs for this task?

For this approach Vision-Language Model (VLM) LLava 13b, a multimodal large language model was used for text generation from image based on prompts given.&#x20;

```markup
"llava-hf/llava-1.5-7b-hf"
```

The model was then loaded on `cuda` with 4-bit quantization using bitsandbytes configuration.

```python
import torch
from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16
)
```

When presented with an image, the model was prompted to "describe in detail the image and its objects."

```notebook-python
prompt = "USER: <image>\nDescribe in detail the objects and what is hapenning in the image to me.\nASSISTANT:"
```

<figure><img src="/files/LaUcychy4ZqNGNdJGPVj" alt="" width="563"><figcaption><p>Test Image used for VLM KGE</p></figcaption></figure>

**The model output based on prompt was then used for Triplet extraction task.**

> `[['The image features a group of people, including a boy and a girl, sitting on a bench'], [' They are accompanied by a white dog, which is sitting on the ground in front of them'], [" The scene appears to be a casual gathering, with the people and the dog enjoying each other's company"]]`

**Triplet Extraction with Babelscape REBEL-large (**[**https://huggingface.co/Babelscape/rebel-large**](https://huggingface.co/Babelscape/rebel-large)**)**

With the detailed textual description in hand, the next step involved transforming this unstructured information into a structured format. This was achieved using the REBEL model, a specialized tool for triplet extraction. <mark style="color:blue;">The process involved identifying and extracting triplets in the form of head-relation-tail (H-R-T) structures.</mark> These triplets serve as the foundational building blocks for knowledge graphs, where the "head" represents an entity, the "relation" indicates the type of connection, and the "tail" signifies another entity linked to the head.

<figure><img src="/files/RBBvRz9jInQvu92VjlD8" alt="" width="374"><figcaption><p>Snippet of the extracted triplet</p></figcaption></figure>

**Constructing the Knowledge Graph**

Once the triplets were extracted, the task of constructing the knowledge graph began. Using visualization libraries like NetworkX and Plotly, the extracted triplets were organized into a coherent and visually interpretable knowledge graph.&#x20;

<figure><img src="/files/10aCprMm3vAzcHxOA50R" alt="" width="375"><figcaption></figcaption></figure>

<figure><img src="/files/DsmeU5kvE6kT6DjXWSiI" alt="" width="333"><figcaption><p>KG Visualizations with NetworkX</p></figcaption></figure>

As can be seen, the knowledge graph generated is a <mark style="color:blue;">**Hit & Miss**</mark> , not everything is perfect here but most things do make sense.&#x20;

```
 'cars', 'type': 'part of', 'tail': 'street'}],
 [{'head': 'parked', 'type': 'part of', 'tail': 'street'}
 
 [{'head': 'modes of transportation',
   'type': 'location',
   'tail': 'urban environment'}
```

### Relationformer&#x20;

Along with using VLMs we looked into other preexisting frameworks for scene graph generation. W e used scene graph for it's acceptable results and comparatively well documented code. The architectural overview of this framework is given in the figure below. It used an encoder decoder based architecture where the image is encoded to be used for decoder cross attention.&#x20;

`The decoder has N`` `<mark style="color:red;">`[obj]`</mark>`(object) tokens and one`` `<mark style="color:red;">`[rel]`</mark>`(relation) token`.

The object tokens and relation tokens are there to encode the information for object detection and relation predictions respectively when they are output by the decoder.&#x20;

<div data-full-width="false"><figure><img src="/files/F7alwKPwBW5iOqY2WpDn" alt="" width="563"><figcaption><p>The relationformer framework. For description refer text.(Image taken from the original paper)</p></figcaption></figure></div>

In order to run inference using this, we've made an inference code and detailed the setup that can be used to replicate our results.

Some things you should be aware of while running inference/training:

* The framework was trained on visual genome dataset(<https://homes.cs.washington.edu/~ranjay/visualgenome/index.html>) having 150 object categories and 50 relation categories. We organized it and make it available [here](https://drive.google.com/drive/folders/1-77DLCL__TBx7PxvMK73Q6s49fcu5XXw?usp=sharing). You can use this for training&#x20;
* Also to fit the model on the available GPUs for training, we have reduced the model size, please refer to the config file also in the repository.
* We also give our pretrained weights in the same inference notebook.

**Results**

Here we list the results we got after training it for five epochs. We're plotting just the highest score object detections and relations. We see that the model isn't converged from the object detections and the relation predictions in the graph as well. This may be because we've changed the model architecture without changing the underlying hyperparameters or it just may be the lack of enough training.

<figure><img src="/files/OFYKcswWwLL19yY8Eit9" alt=""><figcaption><p>scene graph generated with Relationformer</p></figcaption></figure>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://agam-pandey.gitbook.io/knowledge-graph-embedding-or-dsg-iitr/knowledge-graph-generation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
