Ideas for Autonous AI Agents

Hello everyone, I am looking for ideas related to autonomous AI Agents. If I like your ideas, I will build them. Please define your problem statement clearly and precisely, and if you can, define how might it work

1 Like

I would pay for something that is able to check all the various places I find AI news, put it all into one place and takes certain actions based on how relevant that news is to me.

For example if some company adds some minor AI feature, that’s a 1/5 interesting and just ignore it.

New paper on Arxiv that is really good, that is 4/5 interesting and I need a list that gets emailed to me once every 24 hours with all those ideas on it.

But let’s say OpenAI drops gpt-5, then I need a text message, alarms triggered etc.

So basically something that goes through ALL of the channels, then sorts them then takes actions on them.

Zapier can do a lot of the ‘dumb’ tasks, but the really crucial thing is the AI that sorts it. The ability to find what is really important and urgent and then alert me to it.

7 Likes

That’s an amazing idea, Wow. I’ll be on it. I will need information sources on where I can find AI news and reputable vendors. Meanwhile I will work on implementing the arxiv reranker.

Short term plans:

  1. Utilize llm scanning and predict based on interest
  2. Google trends can be a good resource to scan

Long term plans:
A couple of problems and edge cases however:
What I might find interesting, some people may find it boring, so i will need to think this through on how to classify and categorize people. This might involve a reccomender engine to solve it

2 Likes

I am the founder of State of the Heart Recovey, revolutionizing and disrupting the substance use disorders treatment, mental health treatment, incarceration, and homelessness industrial complex. We have many hundreds of clients that walk in the door each and every day. An autonomous AI agent could be super supportive in the treatment teams daily efforts with the customers.

Im working on a similar thing right now, Its an Autonomous AI Agent specialized for mental health domain expertise and dealing with patients plagued with mental health issues. Its still a prototype at this stage but I can give you a demo if you wish.

1 Like

Absolutely. Let’s connect.
Paul@shrnm.org

Natural20, I’ve been thinking along the lines about what you are referring. I’m not really sure what to call it, but I think I’d call them “waystones” or “news aggretators”. Silly names, I know…but what they’d be would be people around the world would pool domain knowledge/news into something you can query against…meta tags, SEO keywords, transcripts, links, and what not. It’s a very unrefined idea I have, so don’t have specifics in mind, not a one-size fits all, since much of the data might behind pay walls, or just login credentials… one of the biggest use cases I can think of, is following your favorite content creator’s YouTube channels, parsing all the words for autogenerating additional meta-tags, reading all the comments to find “gems” of ideas. Using incredibly large context window LLMs to take a snapshot and make these waystones of domains, and then series of smaller, cheaper autonomous AI agents that process it, think on it, compose ideas how to take stuff further, or direct further research. Like I said, very unrefined idea I’ve tossed around.

To take this idea way too far…

I would like ad, spam and scam filters. Also a corporate and political bias indicator that takes into account the author’s entire online presence and social network connections. (EDIT: I’m going to walk that back a bit, it may be inevitable but feels a little creepy on reflection)

Not a big ask… :joy:

Clearly there’s a market for this type of high-volume reading-and-processing work.

1 Like

I realized after writing this, I’m all over the place for whom I’m responding to, lol. I guess, when you read “the alerts” part, sub out whatever you want for some arbitrary function that the rest of the autonomous agent would perform.

Let’s work that problem backwards. Just so that everyone who might want to contribute isn’t quite sure along the way, but could help out at any particular point in the process.
The alerts

  • Sending a text message, evaluate what your options are there. I wouldn’t suggest using plain old SMS texting, probably use WhatsApp or Signal, etc…I personally, would only ever recommend Signal for almost everything that has the “feel” of texting…because, IMO, it has a superb API that you can use with a bot. Bit of a learning curve how to use it, but last time I tried was pre-ChatGPT 4 days so is likely much easier now. You’d create various “rooms” in your Signal messenger, with various alert sounds.
  • For email, you could have SMTP server related config all set up, just pipe in the message body, and send to yourself. Easy on a Raspberry Pi, probably just as easy on a computer.
  • For alarms triggered, probably use an Arduino or PIC microcontroller if you are thinking about physical buzzers, lights, etc. Those ESP32’s could be a good fit, since they have networking abilities…Probably just a redundant notification method, reserved for big news, or to tell you that there’s a new email awaiting you.
    Work the end bits to make sure it’s to your liking, that it functions well to your liking, that you have the means to receive. Then start from the beginning. Find any source that you have relied on. I suppose you don’t want to divulge all your sources, so let’s tackle one common, known one being arxiv. Of course, it’s a good idea to know whether or not something has an API, which it appears that it does, so assume that you can periodically check that for all search terms you want to know about…that’s an exercise in learning the API inside and out, to enumerate a list of all the results. The important thing is know what you have already evaluated. So, let’s talk about the evaluating part. If you want to bring your own local LLMs to the task, it’s worth considering just how much context window you can manage…probably will need to split the PDFs into multiple parts, extract the text, and pose various prompted questions against the fragments…unless you can swing an LLM with 64k-128k token window…and, not sure what all local LLMs are out there with that big of a context window, but I did come across one nous Yarn mistral, which is a novel model with a 128k context window…that’s probably big enough to handle most papers. I think, take Dave Shapiro’s concept of sparse prime representation, run the compression against the papers, and store the SPRs to each paper, along with a summary…put that into a database. Any time a new entry is made in that database, probably direct more discerning LLMs against it to see if its a worthy candidate for further consideration…Exactly how all that goes down, I’m running out of steam for one night to think on that bit…but assuming a good paper is found, you could use Python to handle sending you the email, Signal message, Arduino commands, etc.

I started dabbling with this more last night. Using Python+Selenium+Chromium, I’m able to check a specific YouTube channel and find the most arbitrary number n uploaded videos…their titles and their URLs. Once the URLs are known, it’s a pretty easy task to fetch the transcripts using non-web browser code…and then once the transcripts are acquired, it’s pretty trivial to pass it through Gemini to run whatever sort of summarization or decision making you want against them…then once the “hard work” has been done by Gemini, local LLMs can read that in and be tied in with whatever notification or action scheme you choose. Chose to do the first part without YouTube API or BeautifulSoup4. The setup of a persistent Chromium user profile is key, I think. Having cookies, passwords, login credentials, that way your autonomous workflow goes smoother and can reach the sites/sources you want to get data/information from…I can expound on things if anyone wants help.

yeah Im in shoot me a mail with your contacts, lets plan it out first.
choudhurybhaswata@gmail.com

Sent you an email.

Almost done with this step. Have the workings down pretty well, monitoring numerous YouTubers’ channels, finding when a new video has been uploaded, getting transcripts, analyzing the TLDW summary with Gemini, sending Signal messages with summary. Should be done tomorrow with something that can go round-the-clock…still need to take care of all the subjective assessment to assign attention priority, make the GUI more informative, etc. And my workflow is on Windows, so mileage may vary for Mac or Linux users…once it’s to my liking, I can share code and instructions with anyone interested…my first order of business once it’s good to go, is to repurpose the project script and retarget it at job finding boards like Indeed.com, Craigslist, etc.