Cristian Valdivia Ramirez

Days 8-15 / 60

Second week completed. 25% of the journey done.

The difficult week

I'll be very honest: this week was not productive at all. End of year holidays, ~2,000 km of travel, and little opportunity to sit down and work. It frustrates me a bit because time doesn't stop and there are 45 days left. I need to compensate with more intensity in the coming weeks.

The agent is breathing

Despite the little time, I achieved something important: the first version of the agent is working and starting to behave as expected.

Last week I mentioned it felt "rough". This week there's a notable change, and it wasn't simply from having a better prompt but rather I changed the model.

The jump from Gemini 2 to Gemini 3

I started using Gemini 2 Flash because I wanted fast responses. The problem: it constantly failed with tools. Sometimes it wouldn't call any when it clearly should. Other times it hallucinated made-up data without even attempting to run a simulation.

Since I'm participating in the Gemini 3 Hackathon, I decided to try gemini-3-flash-preview—the fastest version of Google's most powerful model. The difference is brutal: reasoning is more consistent and tool calls work as they should. The agent stopped hallucinating and started calling tools properly.

DN running a power flow in PowerFactory

The acid test: compared faults

I asked the agent to run a single-phase fault at 50% of line 1 and then a three-phase fault to compare results. It worked perfectly.

Fault comparison: single-phase vs three-phase

The interesting thing is not just that it executed the simulations, but that it understood the sequence without me having to explain each step. It used 3 of the 10 available iterations in the agentic loop:

Understand the loaded project
Execute single-phase fault at 50% of the line
Execute three-phase fault at 50% of the line and compare

With Gemini 2 Flash, this same prompt had ended with the agent making up short-circuit currents without simulating anything.

About the 10 iteration limit

For now I configured a maximum of 10 tool calls per query. It's an arbitrary number that allows me to experiment without the agent entering infinite loops. For simple tasks like fault comparison, 3 iterations are enough. For a complete ECAP, I'll probably need more, but that's future me's problem.

The web app

I also have a web application working. For now it's for internal testing, but the idea is to make it available soon so others can interact with the agent.

What's next

Next steps:

Machine benchmark: My test systems are so small that simulations take less than 5 seconds. I need to test with real cases to understand the limits.
More tools: The agent only knows how to simulate. I still need to connect the Coordinator's PGP and Infotecnica so it can learn from real studies.

Week 2 Summary: First version of the agent working