A Busy Start to the Year: Exploring AI, Agents, and Automation

As we kick off a new year, I wanted to take a moment to reflect on my first week. It’s been a whirlwind of activity, filled with experiments, learning, and a dive into emerging technologies.

Rob Hoeijmakers

10 Jan 2025 — 3 min read

New views emerging from the fog.

While I didn’t manage to finish anything completely, I gained valuable insights that will shape my projects in the weeks ahead.

Here’s a recap of what I’ve been up to.

Discovering Gemini Video

One of the standout moments this week was experimenting with Gemini Video (Google AI Studio). This tool uses AI to analyse video content with remarkable precision. From screencasts to real-time footage, its capabilities are nothing short of impressive.

I conducted a small experiment by filming my bookshelf, having Gemini scan the video for text recognition (OCR). It not only identified the titles but categorised them, demonstrating its potential for organising visual data. This left me eager to explore its applications further, from UX analysis to creative storytelling.

Here Gemini analysis a video of a toy train riding under a couch

Deepening My Understanding of AI Agents

I also dived into the world of AI agents, starting with an intriguing article defining what makes a "true agent."

Key factors include its environment—be it digital or physical—its memory, and its ability to act autonomously. There is a clear theme emerging: AI is evolving to become more agent-like, with increasing modalities to provide it with agency.

If your “AI agent” only reacts from an external cron, it’s a regular LLM.

If your “AI agent” has no external memory, it’s a regular LLM.

Giving an API wrapper hooks, or web access doesn’t make it more of an agent than any other response engine.

AI agents will be *huge* but 90%…
— Adam Cochran (adamscochran.eth) (@adamscochran) December 31, 2024

When AI operates in a digital environment, like the web, it can leverage multiple pathways. It might rely on vision-based inputs, such as analysing visual content or interfaces, or interact directly through APIs. This versatility is what makes modern AI agents so powerful and adaptable.

Testing Anthropics’ Browser Tools

Another highlight was experimenting with Anthropics’ browser extension, a tool designed for natural language interaction with the web.

It’s still a bit slow but promising in concept—imagine having AI assist you in tasks like looking up information, managing emails, or automating processes.

Exploring APIs and Automation with Make.com

Finally, I turned my attention to automation using tools like Make.com. While trying to streamline tasks like retrieving OpenAI billing invoices, I encountered some challenges.

Yet, I was struck by how AI simplified the process, filling in gaps where my technical knowledge fell short. This combination of abstraction and precision is a great help for complex workflows.

---

This week has been a mix of exploration and learning, setting the stage for exciting projects in the coming months. Stay tuned as I continue experimenting, and let’s keep the curiosity alive!

Do you have experiences with these tools or suggestions for what I should try next? Share your thoughts—I’d love to hear from you.

P.s. And if you still want to read further, here is a great update by Ethan Mollick on AGI and Agents.