Data Model

Supershyft OS uses a combination of D1 (SQLite), R2 (object storage), and Vectorize (embeddings) for data persistence.

D1 Tables

Companies

Primary entity storing company information.

sql

CREATE TABLE companies (
  id TEXT PRIMARY KEY,                -- 'cmp_' + nanoid()
  name TEXT NOT NULL,
  country TEXT,
  revenue_eur REAL,                   -- in M€
  revenue_growth REAL,                -- percentage
  ebitda_margin REAL,                 -- percentage
  employee_count INTEGER,
  llm_score_overall INTEGER,          -- 0-10,000
  status TEXT DEFAULT 'active',
  source TEXT,                        -- Data provenance
  created_at TEXT DEFAULT CURRENT_TIMESTAMP,
  updated_at TEXT DEFAULT CURRENT_TIMESTAMP
);

Indexes:

sql

CREATE INDEX idx_companies_name ON companies(name);
CREATE INDEX idx_companies_status ON companies(status);
CREATE INDEX idx_companies_source ON companies(source);

Theses

Investment thesis definitions.

sql

CREATE TABLE theses (
  id TEXT PRIMARY KEY,                -- 'th_' + nanoid()
  title TEXT NOT NULL,
  code TEXT UNIQUE,                   -- e.g., "TH-001"
  content TEXT,                       -- Markdown report
  status TEXT DEFAULT 'active',
  created_at TEXT DEFAULT CURRENT_TIMESTAMP,
  updated_at TEXT DEFAULT CURRENT_TIMESTAMP
);

Dealflow

Pipeline tracking for companies.

sql

CREATE TABLE dealflow (
  id TEXT PRIMARY KEY,                -- 'df_' + nanoid()
  company_id TEXT NOT NULL,
  stage TEXT NOT NULL,                -- 'initial', 'qualified', etc.
  probability INTEGER,                -- 0-100
  estimated_value_eur REAL,
  next_action TEXT,
  next_action_date TEXT,
  owner_email TEXT,
  notes TEXT,
  created_at TEXT DEFAULT CURRENT_TIMESTAMP,
  updated_at TEXT DEFAULT CURRENT_TIMESTAMP,
  FOREIGN KEY (company_id) REFERENCES companies(id)
);

Stages:

initial - Initial contact
qualified - Meets criteria
meeting - Meeting scheduled
diligence - Due diligence
offer - Offer stage
closed_won - Deal won
closed_lost - Deal lost

Activity Log

Audit trail for all actions.

sql

CREATE TABLE activity_log (
  id TEXT PRIMARY KEY,                -- 'act_' + nanoid()
  user_email TEXT NOT NULL,
  action TEXT NOT NULL,               -- 'company_created', 'stage_changed'
  entity_type TEXT,                   -- 'company', 'thesis'
  entity_id TEXT,
  metadata TEXT,                      -- JSON string
  created_at TEXT DEFAULT CURRENT_TIMESTAMP
);

Common Actions:

company_created
company_updated
dealflow_stage_changed
embeddings_generated
bulk_action_started

Bulk Actions

Background job tracking.

sql

CREATE TABLE bulk_actions (
  id TEXT PRIMARY KEY,                -- 'ba_' + nanoid()
  user_email TEXT NOT NULL,
  action_type TEXT NOT NULL,          -- 'generate_embeddings'
  status TEXT DEFAULT 'pending',      -- 'pending', 'running', 'completed'
  total_items INTEGER,
  processed_items INTEGER DEFAULT 0,
  created_at TEXT DEFAULT CURRENT_TIMESTAMP,
  completed_at TEXT
);

Users

User profiles and preferences.

sql

CREATE TABLE users (
  email TEXT PRIMARY KEY,
  name TEXT,
  avatar_url TEXT,
  role TEXT DEFAULT 'analyst',        -- 'analyst', 'partner', 'admin'
  preferences TEXT,                   -- JSON string
  created_at TEXT DEFAULT CURRENT_TIMESTAMP,
  updated_at TEXT DEFAULT CURRENT_TIMESTAMP
);

R2 Storage

Company Profiles

Path: companies/{company_id}/profile.json

json

{
  "id": "cmp_abc123",
  "name": "Acme Corp",
  "enrichment": {
    "description": "AI-generated company description",
    "products": ["Product A", "Product B"],
    "competitors": ["Comp 1", "Comp 2"],
    "technology_stack": ["React", "Node.js"]
  },
  "sources": {
    "website": "https://acme.com",
    "linkedin": "https://linkedin.com/company/acme"
  }
}

Source Data

Path: companies/{company_id}/source.csv

Original import data for audit trail.

Vectorize

Company Embeddings

Index: company-embeddingsDimensions: 768 (Workers AI @cf/baai/bge-base-en-v1.5)

typescript

interface VectorMetadata {
  company_id: string
  name: string
  description: string
  created_at: string
}

ID Generation

All IDs use nanoid with prefixes:

cmp_ - Companies
th_ - Theses
df_ - Dealflow
act_ - Activity log
ba_ - Bulk actions

typescript

import { nanoid } from 'nanoid'

const companyId = `cmp_${nanoid()}`

Relationships

mermaid

erDiagram
    COMPANIES ||--o{ DEALFLOW : "tracks"
    COMPANIES ||--o{ ACTIVITY_LOG : "logs"
    USERS ||--o{ ACTIVITY_LOG : "performs"
    USERS ||--o{ BULK_ACTIONS : "initiates"

Data Consistency

Foreign Keys - Enforced in D1
Transactions - Used for multi-table updates
Optimistic Locking - updated_at timestamp checks
Activity Logging - All mutations logged

Migrations

Located in apps/api-worker/migrations/:

bash

0001_initial.sql
0002_add_theses.sql
0003_add_dealflow.sql

Apply with:

bash

wrangler d1 migrations apply supershyft-os --local

Data Model ​

D1 Tables ​

Companies ​

Theses ​

Dealflow ​

Activity Log ​

Bulk Actions ​

Users ​

R2 Storage ​

Company Profiles ​

Source Data ​

Vectorize ​

Company Embeddings ​

ID Generation ​

Relationships ​

Data Consistency ​

Migrations ​