A Busy Start to the Year: Exploring AI, Agents, and Automation
As we kick off a new year, I wanted to take a moment to reflect on my first week. It’s been a whirlwind of activity, filled with experiments, learning, and a dive into emerging technologies.
While I didn’t manage to finish anything completely, I gained valuable insights that will shape my projects in the weeks ahead.
Here’s a recap of what I’ve been up to.
Discovering Gemini Video
One of the standout moments this week was experimenting with Gemini Video (Google AI Studio). This tool uses AI to analyse video content with remarkable precision. From screencasts to real-time footage, its capabilities are nothing short of impressive.
I conducted a small experiment by filming my bookshelf, having Gemini scan the video for text recognition (OCR). It not only identified the titles but categorised them, demonstrating its potential for organising visual data. This left me eager to explore its applications further, from UX analysis to creative storytelling.
Deepening My Understanding of AI Agents
I also dived into the world of AI agents, starting with an intriguing article defining what makes a "true agent."
Key factors include its environment—be it digital or physical—its memory, and its ability to act autonomously. There is a clear theme emerging: AI is evolving to become more agent-like, with increasing modalities to provide it with agency.
When AI operates in a digital environment, like the web, it can leverage multiple pathways. It might rely on vision-based inputs, such as analysing visual content or interfaces, or interact directly through APIs. This versatility is what makes modern AI agents so powerful and adaptable.
Testing Anthropics’ Browser Tools
Another highlight was experimenting with Anthropics’ browser extension, a tool designed for natural language interaction with the web.
It’s still a bit slow but promising in concept—imagine having AI assist you in tasks like looking up information, managing emails, or automating processes.
Exploring APIs and Automation with Make.com
Finally, I turned my attention to automation using tools like Make.com. While trying to streamline tasks like retrieving OpenAI billing invoices, I encountered some challenges.
Yet, I was struck by how AI simplified the process, filling in gaps where my technical knowledge fell short. This combination of abstraction and precision is a great help for complex workflows.
---
This week has been a mix of exploration and learning, setting the stage for exciting projects in the coming months. Stay tuned as I continue experimenting, and let’s keep the curiosity alive!
Do you have experiences with these tools or suggestions for what I should try next? Share your thoughts—I’d love to hear from you.
P.s. And if you still want to read further, here is a great update by Ethan Mollick on AGI and Agents.