Skip to content

Tutorial

This tutorial will guide you through building and querying a knowledge graph using Amazon.com Inc.'s 2024 10-K filing and the provided schema. We'll use the WhyHow SDK to import relevant information from the 10-K document into a knowledge graph and then query it for insights related to Amazon's business.

You can find the Amazon documents, schema, and many others here: https://github.com/whyhow-ai/schemas

Environment Setup

Ensure you have Python 3.10 or higher installed on your machine.

To get started while we’re in Beta, you’ll need to get a Beta Access Key. You can do so by scheduling a call with us.

If you already have access to our Platform Beta, you can retrieve your API keys in the Settings page of https://app.whyhow.ai/ or from email communication when you registered.

After signing up, to get your API key, you can find it in the WhyHow settings page. To keep your API key secure, set it as an environment variable. Open your terminal and run the following command, substituting the placeholder with your actual data:

export WHYHOW_API_KEY=<YOUR_WHYHOW_API_KEY>

Install WhyHow SDK

If you haven't already, install the WhyHow SDK using pip:

pip install whyhow

Configure the WhyHow Client

With your environment variable set, you can now configure the WhyHow client in your Python script. The client will automatically read in your environment variable, or you can override this value by specifying it in the client constructor.

from whyhow import WhyHow

client = WhyHow(api_key=<your WhyHow API key>, base_url="https://api.whyhow.ai")

Option 1 - Create the Knowledge Graph from a schema

First, let's define the workspace for our project and specify the path to Amazon's 10-K document. Your workspace is a logical grouping of the raw data you upload, the schema you define, and the graphs you create.

workspace = client.workspaces.create(name="Amazon 10-K Analysis")
# or if you already have a workspace
# workspace = client.workspaces.get(workspace_id="<workspace_id>")
document_path = "path/to/amazon_10k_2024.pdf"

# Add document to your workspace
document = client.documents.upload(
    document_path,
    workspace_id=workspace.workspace_id
)
print("Document Added:", document)

Next, we'll create a schema based on the provided JSON file. This schema defines the entities, relationships, and patterns we'll use to construct the graph.

from whyhow import SchemaEntity, SchemaRelation, SchemaTriplePattern

# Load the schema from the JSON file
entities, relations, patterns = client.schemas.load_json("amazon_10k_schema.json")

schema = client.schemas.create(
    workspace_id=workspace.workspace_id,
    name="Amazon 10-K Schema",
    entities=entities,
    relations=relations,
    patterns=patterns,
)

Now, let's create the graph using the schema and the uploaded 10-K document:

# Create graph from schema and document
graph = client.graphs.create(
    workspace_id=workspace.workspace_id,
    schema_id=schema.schema_id,
    name="Amazon 10-K Graph",
)
# Creating your graph

Option 2 - Create the Knowledge Graph from seed questions

Alternatively, you can create a graph using seed concepts in the form of questions written in natural language. We'll create a new workspace and upload the same data.

workspace = client.workspaces.create(name="Amazon 10-K Analysis (Auto-Generated)")
document_path = "path/to/amazon_10k_2024.pdf"

# Add document to your workspace
document = client.documents.upload(
    document_path,
    workspace_id=workspace.workspace_id
)
print("Document Added:", document)

Create the schema from the seed questions:

questions = [
    "What are Amazon's primary business segments?",
    "How does Amazon generate revenue?",
    "What are the key risk factors for Amazon's business?",
    "Who are Amazon's main competitors?",
    "What is Amazon's strategy for future growth?"
]

entities, relations, patterns = client.schemas.generate(
    questions=questions,
)
print("Entities:", entities)
print("Relations:", relations)
print("Patterns:", patterns)

schema = client.schemas.create(
    workspace_id=workspace.workspace_id,
    name="Amazon 10-K Auto-Generated Schema",
    entities=entities,
    relations=relations,
    patterns=patterns,
)

# Create graph from schema and document
graph_auto = client.graphs.create(
    workspace_id=workspace.workspace_id,
    schema_id=schema.schema_id,
    name="Amazon 10-K Auto-Generated Graph",
)
# Creating your graph

Now you can query the auto-generated graph:

# Query the auto-generated graph
question = "What are Amazon's main revenue streams?"
query = client.graphs.query_unstructured(
    graph_id=graph_auto.graph_id,
    query=question,
)
print("Query Response:", query.answer)

# Query for specific competitive advantages
question = "What are Amazon's key competitive advantages?"
query = client.graphs.query_unstructured(
    graph_id=graph_auto.graph_id,
    query=question,
)
print("Query Response:", query.answer)

This approach allows you to create a knowledge graph based on specific questions you're interested in exploring from the Amazon 10-K document. The auto-generated schema will focus on entities and relationships relevant to these questions, potentially providing a more targeted analysis of the document.

Option 3 - Create Knowledge Graph from Imported/Existing Triples

# Let's say we want to create a new graph focusing on Amazon's revenue streams
revenue_triples = [
    triple for triple in all_triples 
    if "revenue" in triple.head.label.lower() or "revenue" in triple.tail.label.lower()
]

# Create a new graph from these revenue-related triples
revenue_graph = client.graphs.create_graph_from_triples(
    workspace_id=workspace.workspace_id,
    triples=revenue_triples,
    name="Amazon Revenue Streams Graph",
    schema_id=schema.schema_id,
)

print(f"Created new graph: {revenue_graph.name}")

# Now we can query this new, more focused graph
question = "What are Amazon's main sources of revenue?"
query = client.graphs.query_unstructured(
    graph_id=revenue_graph.graph_id,
    query=question,
)
print("Query Response:", query.answer)

# We can also get all triples from this new graph
revenue_graph_triples = list(client.graphs.get_all_triples(graph_id=revenue_graph.graph_id))

print("Triples in the Revenue Streams Graph:")
for triple in revenue_graph_triples:
    print(f"Subject: {triple.head.name}, Relation: {triple.relation.name}, Object: {triple.tail.name}")

Querying the Knowledge Graph

With the graph created, we can now query it to find specific information about Amazon's business:

# Query graph for Amazon's business segments
question = "What are Amazon's main business segments?"
query = client.graphs.query_unstructured(
    graph_id=graph.graph_id,
    query=question,
)
print("Query Response:", query.answer)

# Query graph for Amazon's revenue streams
question = "What are Amazon's primary revenue streams?"
query = client.graphs.query_unstructured(
    graph_id=graph.graph_id,
    query=question,
)
print("Query Response:", query.answer)

# Include the chunks in the query
question = "What are the key risk factors for Amazon's business?"
query = client.graphs.query_unstructured(
    graph_id=graph.graph_id,
    query=question,
    include_chunks=True,
)
print("Query Response:", query.answer)

# Query the graph for specific relations
relations = ["comprises", "contributes_to"]

query = client.graphs.query_structured(
    graph_id=graph.graph_id,
    relations=relations,
)
print("Query Response:", query.triples)

After querying the graph, let's retrieve all triples

print("Retrieving all triples from the graph:")
all_triples = list(client.graphs.get_all_triples(graph_id=graph.graph_id))

for triple in all_triples:
    print(f"Subject: {triple.head}, Relation: {triple.relation}, Object: {triple.tail}") 

Add Chunks to the Graph

document_path = "path/to/amazon_10k_2023.pdf"

for i in tqdm(range(0, len(text), 1024)):
    chunk = text[i:i + 1024]
    chunks = client.chunks.create(
        workspace_id=workspace.workspace_id,
        chunks=[
            Chunk(
                content=chunk
            )
        ]
    )
    client.graphs.add_chunks(
        graph_id=graph.graph_id,
        ids=[chunks[0].chunk_id]
    )

Add Triples to the Graph

    triples = [
        Triple(
            head=Node(name="Amazon", label="Company"),
            relation=Relation(name="Has business segment"),
            tail=Node(name="North America", label="Business Segment"),
        ),
        Triple(
            head=Node(name="Amazon", label="Company"),
            relation=Relation(name="Has business segment"),
            tail=Node(name="International", label="Business Segment"),
        ),
        Triple(
            head=Node(name="Amazon", label="Company"),
            relation=Relation(name="Has business segment"),
            tail=Node(name="AWS", label="Business Segment"),
        ),
        Triple(
            head=Node(name="E-commerce", label="Revenue Stream"),
            relation=Relation(name="Contributes to"),
            tail=Node(name="Amazon", label="Company"),
        ),
        Triple(
            head=Node(name="Cloud computing services", label="Revenue Stream"),
            relation=Relation(name="Contributes to"),
            tail=Node(name="AWS", label="Business Segment"),
        ),
    ]

    graph = client.graphs.add(
        graph_id=graph.graph_id,
        triples=triples
    )

Exporting the Knowledge Graph

You can export the graph as a Cypher query to use in Neo4j or other graph databases:

cypher = client.graphs.export_cypher(graph_id=graph.graph_id)
print(cypher)

This tutorial demonstrates how to create a knowledge graph from Amazon's 10-K filing using the provided schema, and how to query the graph for insights into Amazon's business structure, revenue streams, and risk factors. You can expand on this by adding more specific queries or by analyzing different aspects of the 10-K document as needed for your analysis.