Based in San Francisco, Braintrust by Ankur Goyal is an AI evaluation and observability platform that assists companies in supervising, testing, and improving their AI products. On this specific platform, teams can inspect traces, compare prompts, evaluate model performance, analyze regressions, and measure quality across production AI systems.
Ankur Goyal faced a lot of challenges personally while managing AI systems firsthand when he was working at Figma after the acquisition of his previous startup, Impira. Thus, he founded Braintrust in 2023 as a platform where one can test, run, and analyze AI applications. Since its launch, it has been adopted by leading technology companies such as Stripe, Cloudflare, Vercel, Notion, and Zapier. Raising $121 Million in funding recently, the platform continues to grow as an AI evaluation and observability.
Can you briefly introduce yourself and tell us about your business?
My name is Ankur Goyal, and I am the founder and CEO of Braintrust. Throughout my career, I have always focused on building technology systems that can operate efficiently at a large scale. In 2011, I left Carnegie Mellon University to join MemSQL (now SingleStore) as one of its early employees. Gradually, I became the Vice President of Engineering over the next eight years, where I built the distributed database technology.
I also founded Impira, which is an AI-powered platform for document extraction and data management. However, in 2022, Impira was acquired by Figma. At Figma, I led several AI and machine learning platform initiatives. But while I was working at Figma, I realized that many AI teams find it challenging to measure and improve the performance of their AI systems.
This was the turning point where I felt that I needed to fill this gap. My experience in this field helped me launch Braintrust in 2023. Today, it is the leading platform that helps companies test, monitor, evaluate, and improve their AI applications. The platform allows teams to compare prompts, analyze outputs, track performance, monitor costs, and make data-driven decisions to build more efficient AI products.
What inspired you to start Braintrust?
When I was working at Figma, every time we changed a prompt, adjusted a model, or added a new workflow, we faced the same doubt: did the application actually improve? We had no tool to check. It was just our intuition, or gut feeling, or as I say, “vibe checks” that led us to launch.
We had to ask for validation from others, but there was no systematic way to prove that the application improved. This was extremely frustrating for me. So, this realization was my only inspiration to build something that bridged this gap.
Therefore, I built Braintrust that served as a standardized infrastructure for evaluating quality, measuring performance, and detecting regressions. With this platform, AI teams can confidently answer whether a change improved their product or not.
What challenges came with building Braintrust?
One of the biggest challenges I faced while building Braintrust was making companies understand that AI evaluation is an essential component of building reliable AI products. We cannot just rely on manual outcomes. Many organizations wanted to launch AI features quickly, but only a few were thinking about how to measure their performance and quality.
Moreover, we also wanted companies to think beyond just basic monitoring. Our goal was to make them understand why AI systems behave in certain ways, where the issues arose, and how performance changes over time.
As AI adoption grew, companies started facing these issues themselves, which made it easier for us to launch our product. Gradually, the importance of evaluation and observability became clear. Thus, many leading technology companies now use Braintrust for evaluating their AI products.
How do customers benefit from Braintrust?
Customers can get a complete overview of every aspect of their AI systems through Braintrust. Instead of remaining in doubt, they can check whether their AI application is improving or not, and that too with complete accuracy.
The platform makes it easy to inspect traces, monitor latency and cost, compare prompts side by side, run experiments against real datasets, and automatically detect regressions during deployment. This helps to reduce the risk of deploying unreliable AI experiences to users. Thus, engineers have full control to identify issues before customers use their AI products. Some of our prestigious customers include Stripe, Cloudflare, Vercel, Notion, and Zapier.
Did social media or online communities play a role in your growth? How?
Yes. Social media has always been a beneficial tool for us. The AI community is very active across platforms like X, GitHub, developer forums, and open-source spaces. Most of our early customers found the platform through discussions about AI evaluation, observability, prompt engineering, and infrastructure for AI systems.
I post a lot of my thoughts on key topics like context engineering and AI evaluation frameworks. These conversations helped position Braintrust as a trusted voice in this space because they reflected real challenges that developers usually face.
In addition, open source communities have also played a key role in this journey. In my career, I have worked on projects like DuckDB and Hugging Face Transformers, which have helped us engage with engineers building reliable and scalable systems. A lot of our growth has been through word of mouth within technical teams. Engineers who use Braintrust to solve real problems often end up sharing it with other members of the community.
Have you faced difficulties from investors, partners, or customers? How did you handle it?
Yes, all category-leading companies face skepticism early on.
When we launched Braintrust, most people were thinking of models, not evaluation infrastructure. There was some discussion about whether AI observability and evaluation were a separate category or just a feature in the existing toolset. We didn’t discuss the market; we solved actual customer problems. We were working with technical teams that were putting AI into production and needed better ways to measure quality.
One of our biggest validation points was when Zapier became our first paying enterprise customer. Zapier’s whole business is built on reliable automation, so their early adoption of Braintrust really confirmed we were solving a huge problem.
After the customer adoption, we gained the rest of the confidence from our investors. We raised $5.1M in early funding, and then a $36M Series A led by Andreessen Horowitz in 2024. In 2026, we closed an $80 million Series B round led by ICONIQ Capital, bringing total funding raised to more than $121 million and valuing the company at approximately $800 million.
How has being a millennial founder shaped the way you build your company?
I’m a millennial founder, and I’ve watched a few major technology shifts throughout my life. These include cloud computing, distributed systems, mobile apps, machine learning, and now generative AI, all from a front-row seat.
That experience has influenced the way I think about infrastructure. Every big technology wave ultimately needs foundational systems to make it reliable, scalable, and easy to use. Databases made the Internet possible. Cloud platforms enabled software to scale. Trustworthy AI will be aided by an evaluation infrastructure.
This is the mindset that helped us in building Braintrust. We solve fundamental technical challenges rather than chasing short-term trends.
What achievements stand out the most?
One of our biggest achievements was how our platform, Braintrust, quickly gained recognition among the most significant engineering organizations in the world and became a trusted platform to evaluate their AI systems.
Companies including Stripe, Cloudflare, Vercel, Notion, and Zapier rely on our platform to evaluate and monitor AI systems. These organizations have extremely high standards, and earning their trust has been incredibly meaningful.
Apart from this, we believe that one of our substantial milestones was our fundraising journey. From 2023 to 2026, we have raised more than $121 million. Our investors are Andreessen Horowitz, ICONIQ Capital, OpenAI co-founder Greg Brockman, Mistral CEO Arthur Mensch, Vercel founder Guillermo Rauch, Databricks Ventures, Greylock, and several other leaders in AI and infrastructure.
Thus, we are thrilled to see how companies are moving towards structured and data-driven ways of building and improving AI systems.
What excites you about the future of AI?
The thing that excites me the most is that we are still in the early days of building production-grade AI systems. Many organizations are experimenting with AI today, but the next wave will be all about reliability, measurement, and operational excellence. And with AI embedded within important business workflows, companies will require infrastructure that helps them understand, monitor, and improve performance on a continuous basis.
I’m also excited about advances in context engineering. Many people focus only on models, but in practice, the success of those models is often determined by the quality of information given to them. Better context management will help to discover new AI application categories.
At the end of the day, I think AI development will be a lot more systematic. Organizations will leave gut feel behind and create evaluation-based workflows where every improvement can be measured and validated accurately.
What advice would you give to future founders?
My advice is simple: pay close attention to problems that refuse to go away.
Braintrust arose from a challenge I faced myself when building AI products. We kept running into the same evaluation problem, whether it was a company, a team, or a use case. There is often an opportunity worth exploring when a problem keeps bothering smart people who are building important products.
I also encourage founders to build for sophisticated users. Some of our first customers were very demanding engineering teams. They pushed us to get better quicker and helped us to find weaknesses early.
Finally, but most importantly, place an emphasis on product market fit. When you find it, the signal is incredibly clear. Customers return, growth happens organically, and people recommend the product to others. It’s hard to get that momentum going, but once you do, everything changes.
Also Read: Agentio: The Startup Helping Creators Receive Multiple Brand Offers Within Hours
