Systems Online

Private AI
Infrastructure
Coming Soon

Enterprise-grade language model inference and dedicated cloud VMs. Sovereign compute. No rate limits. No data harvesting. Your workload runs on bare metal — not shared cloud. Email Us at Investigations@obsidianwatch.org For custom compute before we finish the site

<80ms Avg. Latency
99.9% Uptime SLA
128K Context Window

Available Models

Open-weight models on dedicated GPU hardware. No queuing. No throttling. OpenAI-compatible API.

Fast
Mistral 7B
Ultra-low latency for high-throughput tasks. Ideal for real-time applications and classification.
Context32K tokens
Speed~120 tok/s
Price$0.08 / 1M tok
Balanced
Llama 3.1 70B
Best-in-class open model. Strong reasoning, code generation, and complex instruction following.
Context128K tokens
Speed~45 tok/s
Price$0.40 / 1M tok
Pro
DeepSeek Coder V2
Specialized for software engineering. Exceptional at multi-file context and debugging.
Context128K tokens
Speed~38 tok/s
Price$0.35 / 1M tok
Balanced
Qwen 2.5 32B
Multilingual powerhouse with strong STEM reasoning and instruction-following capabilities.
Context128K tokens
Speed~55 tok/s
Price$0.25 / 1M tok
Fast
Mistral NeMo 12B
Fast, capable mid-size model. Excellent for summarization, extraction, and multi-turn conversation.
Context128K tokens
Speed~90 tok/s
Price$0.12 / 1M tok
Pro
Llama 3.1 405B
Frontier-class open model. Near-GPT4 performance on complex reasoning, math, and coding tasks.
Context128K tokens
Speed~18 tok/s
Price$1.20 / 1M tok

Dedicated Virtual Machines

Full VMs on bare-metal Xeon hosts. GPU-accelerated options available. No noisy neighbours.

Scout
$29/mo
4 vCPU
16 GB RAM
100 GB NVMe
1 TB / mo
View Details
Ranger
$59/mo
8 vCPU
32 GB RAM
250 GB NVMe
3 TB / mo
View Details
Patriot
$119/mo
16 vCPU
64 GB RAM
500 GB NVMe
5 TB / mo
View Details
Sentinel
GPU Ready
$249/mo
32 vCPU
128 GB RAM
1 TB NVMe
Unmetered
View Details
See All VM Packages & GPU Options

API Access Tiers

Transparent token-based pricing. No hidden fees. Cancel anytime.

Consumer
$19/mo
Full chat interface with generous monthly token allocation
  • Unlimited chat sessions
  • 5M tokens included
  • All available models
  • 128K context window
  • Zero data retention
Get Started
Enterprise
Custom
Dedicated capacity, SLAs, and white-glove onboarding for high-volume workloads.
  • Dedicated GPU allocation
  • Unlimited tokens
  • Custom model deployment
  • 99.9% uptime SLA
  • Private VPN endpoint
  • WireGuard tunnel access
  • 24/7 priority support
Contact Sales

Drop-in OpenAI Compatible

Change one line. Keep your existing code. Our API is fully compatible with the OpenAI client spec — works with any SDK that supports a custom base URL.

  • REST + streaming (SSE)
  • Chat completions endpoint
  • Function calling support
  • Bearer token authentication
  • JSON mode
  • Python & JavaScript SDKs
  • Per-key rate limiting
  • Usage webhooks
Python
from openai import OpenAI # Change only the base_url — keep everything else client = OpenAI( base_url="https://api.patriotsci.com/v1", api_key="ps-your-api-key-here" ) response = client.chat.completions.create( model="llama3.1-70b", messages=[{ "role": "user", "content": "Analyze this dataset..." }], stream=True ) for chunk in response: print(chunk.choices[0].delta.content, end="")

Bare Metal. No Cloud.

Your requests never touch AWS, Azure, or GCP. Dedicated hardware. Sovereign compute. Columbia, TN.

[GPU] A4500 w/NVlink NVIDIA Pro GPUs
[RAM] 1800GB+ System Memory
[CPU] 652c Xeon Cores
[NET] 10GbE Network Fabric
[STG] iSCSI SAN Storage
[SEC] VPN WireGuard Access
[HYP] Level 1 Hypervisors Hypervisor
[FW] Palo Alto Networks Firewall