A Busy Start to the Year: Exploring AI, Agents, and Automation

As we kick off a new year, I wanted to take a moment to reflect on my first week. It’s been a whirlwind of activity, filled with experiments, learning, and a dive into emerging technologies.

A Busy Start to the Year: Exploring AI, Agents, and Automation
New views emerging from the fog.

While I didn’t manage to finish anything completely, I gained valuable insights that will shape my projects in the weeks ahead.

Here’s a recap of what I’ve been up to.

Discovering Gemini Video

One of the standout moments this week was experimenting with Gemini Video (Google AI Studio). This tool uses AI to analyse video content with remarkable precision. From screencasts to real-time footage, its capabilities are nothing short of impressive.

I conducted a small experiment by filming my bookshelf, having Gemini scan the video for text recognition (OCR). It not only identified the titles but categorised them, demonstrating its potential for organising visual data. This left me eager to explore its applications further, from UX analysis to creative storytelling.

Here Gemini analysis a video of a toy train riding under a couch

Deepening My Understanding of AI Agents

I also dived into the world of AI agents, starting with an intriguing article defining what makes a "true agent."

Agents
Intelligent agents are considered by many to be the ultimate goal of AI. The classic book by Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach (Prentice Hall, 1995), defines the field of AI research as “the study and design of rational agents.”

Key factors include its environment—be it digital or physical—its memory, and its ability to act autonomously. There is a clear theme emerging: AI is evolving to become more agent-like, with increasing modalities to provide it with agency.

When AI operates in a digital environment, like the web, it can leverage multiple pathways. It might rely on vision-based inputs, such as analysing visual content or interfaces, or interact directly through APIs. This versatility is what makes modern AI agents so powerful and adaptable.

Building effective agents
A post for developers with advice and workflows for building effective AI agents

Testing Anthropics’ Browser Tools

Another highlight was experimenting with Anthropics’ browser extension, a tool designed for natural language interaction with the web.

It’s still a bit slow but promising in concept—imagine having AI assist you in tasks like looking up information, managing emails, or automating processes.

Exploring APIs and Automation with Make.com

Finally, I turned my attention to automation using tools like Make.com. While trying to streamline tasks like retrieving OpenAI billing invoices, I encountered some challenges.

Yet, I was struck by how AI simplified the process, filling in gaps where my technical knowledge fell short. This combination of abstraction and precision is a great help for complex workflows.

Make.com - AI Assistant (Beta)

---

This week has been a mix of exploration and learning, setting the stage for exciting projects in the coming months. Stay tuned as I continue experimenting, and let’s keep the curiosity alive!

Do you have experiences with these tools or suggestions for what I should try next? Share your thoughts—I’d love to hear from you.

P.s. And if you still want to read further, here is a great update by Ethan Mollick on AGI and Agents.