The future of AI Assistants

Google Assistant, Amazon’s Alexa, and Apple’s Siri stand to gain significant advantages by incorporating Large Language Models (LLMs). However, as of now, none of these voice assistants have officially implemented LLM technology. In their current state, these agents are limited to predefined, manually programmed tasks. The integration of LLMs holds the promise of expanding their capabilities significantly.

  1. When will we see LLMs integrated and made available within those assistants?
  2. What new capabilities will be implemented first?
  3. Will any of them cause a fiasco similar to the one from Microsoft with Tay at 2016?
  4. Does allowing those assistants to access the digital world presents a threat to humanity?
  5. If so, how can it be mitigated?

Here is the answer from Microsoft’s Copilot (which also helped me rewrite the first paragraph of the post):

Now, let’s address your specific questions:

  1. When will we see LLMs integrated and made available within those assistants?
  • Google has internally acknowledged the potential of generative AI, including LLMs, to enhance Google Assistant. Some of their team members are already working on this, starting with mobile devices like Android.
  • Apple is also making strides in deploying LLMs on iPhones. They’ve developed an innovative technique using flash memory to store AI model data, which could pave the way for a more advanced Siri.
  • While specific release dates are not confirmed, both companies are actively exploring LLM integration.
  1. What new capabilities will be implemented first?
  • Google Assistant is experimenting with features like summarizing web pages using generative AI.
  • Apple’s revamped Siri could offer real-time language translation, sophisticated photography features, and augmented reality enhancements.
  1. Will any of them cause a fiasco similar to the one from Microsoft with Tay in 2016?
  • While there’s no certainty, both companies are likely taking precautions to avoid such incidents. Ensuring responsible AI behavior is a priority.
  1. Does allowing those assistants to access the digital world present a threat to humanity?
  • Any powerful technology carries risks. Ensuring robust safeguards, ethical guidelines, and transparency is crucial to prevent unintended consequences.
  1. If so, how can it be mitigated?
  • Rigorous testing, ongoing monitoring, and user feedback loops can help identify and address potential issues.
  • Collaboration across industry, academia, and regulatory bodies is essential to establish best practices.

Remember, the journey toward integrating LLMs into voice assistants is exciting, but responsible development is key to harnessing their full potential while minimizing risks.

  1. Realistically speaking as soon as there is a way to make sure the models won’t spit out very sensitive stuff very easily

  2. Probably the stuff people see most value in probably basic functions an automated PA needs

  3. Arguably kind of what’s happening with the alignment issues at google with gemini right now?

  4. Definetly once we implement that we’ll have no idea if we’re talking to a human online or not and that’s quite dangerous imo.

  5. The reverse turing test. Really interesting concept, if we can somehow manage to implenent that we’ll be golden

  1. I don’t think this is the main concern.
    I suspect it’s more likely that they are trying to decide how limited they want the service to be. Current generation (pre LLM) is limited to a small set of programmed actions. This makes testing a relatively simple matter.
    Once you allow a service to access the Internet, it can essentially do almost anything. Some of the access is actually not read-only, so you should also consider the service accidentally modifying data.
    This is a potential step function from a limited surface area to an unlimited surface area, which makes it a very difficult decision.
    Risks like revealing private data are real, but I think it’s not the primary concern.

  2. I guess a first good step is to implement the same capabilities using an LLM. This would provide better user understanding and can augment the existing capabilities. For example, I often set 5 timer in Alexa at intervals of 30 minutes (I need this while I prepare sourdough for baking).
    Right now I need to go:
    “Alexa, set a timer for 30 minutes.
    Alexa, set a timer for 60 minutes
    Alexa, set a timer for 90 minutes
    Alexa, set a timer for two hours.
    Alexa, set a timer for two hours and thirty minutes.”

That’s a ton of talking for something that can be expressed as:
“Alexa, set 5 timers at 30-minute intervals”.

  1. Yes, Tay was similar to what we are seeing now. We have seen this in multiple versions getting it wrong to one direction or another:
    Tay being racist, Google object classification recognizing black people as Gorillas, Face recognition making more mistakes with people of certain ethnicities and recently Google being extra careful and overly woke and diversifying races in generated photos against historical records.
    I guess what I am more concerned of are ways for those AI bots to be abused by people intentionally to cause harm.

  2. That’s one angle: It will be possible to automate fraud.
    In general, the Internet is not just a read only resource. Imagine a bot building an online presence in social media and manipulate people. This can be initiated by a human of course.
    Another angle we web security. Imagine a bot that is the objective to hack into high sensitivity websites. It can use various hacking tools, most of which are driven by command line or shell access (text based).

  3. Interestingly, I think the best mitigation is going to be a friendly AI in your corner. This will open up the stage for AI manipulating other AIs.