Tool recommendations are more useful when they come with context. We're a team of engineers building AI integrations and custom software for early-stage and growth-stage companies in LATAM and the US. Most of our work involves integrating AI into existing products rather than building AI-native products from scratch.
Claude is our primary model for production integrations. The large context window handles the long-document processing that comes up regularly in legal, financial, and operations use cases. The instruction-following reliability reduces the prompt engineering overhead in structured output applications. We use the Anthropic API directly — no wrapper — which gives us full control over retry logic, caching, and error handling.
For applications where cost is the dominant constraint and the task is well-defined enough to run on a smaller model, we evaluate GPT-4o-mini and Claude Haiku against the specific use case before committing. Frontier models are not always the right answer.
We've moved away from LangChain for most new projects. The abstraction layer adds complexity that slows debugging and creates upgrade dependency issues. For simple chains and RAG pipelines, we write the orchestration logic directly. The principle: use the simplest orchestration approach that works. Every abstraction layer adds a failure surface.
Postgres with pgvector is our default for RAG implementations where the data volume fits within a single database instance. It reduces operational complexity significantly compared to running a dedicated vector database. For larger scale or multi-tenant architectures, we use Pinecone. For Spanish-language content, we test multilingual models explicitly — embedding quality degrades on Spanish text with models primarily trained on English corpora.
This is the area where most teams underinvest. We use a combination of LLM-as-judge (Claude evaluating outputs against a rubric) and human review of random samples. Every AI integration we ship has an evaluation suite before it goes live. That's non-negotiable.
LangSmith for tracing LLM calls in development. In production, we log all inputs, outputs, and latency to our standard logging infrastructure. The goal is to be able to answer: what did the model receive, what did it return, how long did it take, and what happened downstream.
AWS for most deployments. Lambda for stateless AI endpoints with variable traffic patterns. ECS for more complex, stateful services. S3 for document storage and evaluation dataset management. For clients in Mexico with data residency requirements, we use AWS São Paulo or US East depending on the specific regulatory context.