Winding up along the Animas River. I’m in the last car of a 15-car train, so around the curves it’s a great view ahead. A beautiful route. 🚂
More about AI uncertainty
I've blogged before about AI hallucinations, but I wanted to tie together a few new posts I've read recently. Let's start with Dave Winer:
To people who say you get wrong answers from ChatGPT, if I wanted my car to kill me I could drive into oncoming traffic. If I wanted my calculator to give me incorrect results I could press the wrong keys. In other words, ChatGPT is a very new tool. It can be hard to control, you have to check what it says, and try different questions. But the result, if you pay attention and don't drive it under the wheels of a bus, is that you can do things you never could do before.
This is essentially my mindset too. AI makes mistakes. Humans make mistakes. The key is to know what AI is good for and to not let it run wild unattended. This is why with Micro.blog we've been so focused on very limited use cases:
- Podcast transcripts, for which AI is shockingly good. Gotta be close to 99% perfect, and it's easy to edit transcripts to fix mistakes.
- Summarizing bookmarked web pages, also really accurate. I've yet to see any mistakes.
- Photo keywords and accessibility text. Super useful and if it occasionally gets something slightly wrong, it's usually inconsequential and still a huge step forward.
On a recent SharpTech podcast, Ben Thompson also makes this point that we have different expectations for computers and humans. We expect computers to always be right. Calculators and spreadsheets don't lie. But generative AI is something new, and we can't hold it to the same standards we had before.
That's not necessarily to say you're holding it wrong if you ask Google how many rocks to eat. It's up to AI companies to better convey when assistants aren't sure about an answer. I don't know if this is technically possible with how today's models work, but hopefully folks are looking at it.
Finally, Allen Pike had a post this week that was fascinating, about how AI will evolve now that it has chewed up all the data on the internet. I have mixed feelings about this... There's a lot of uncertainty, and also I don't love that we might be improving AI models while neglecting making the web better. But it is still too early to really judge how this is going to play out.
Love these book updates from Brandon Sanderson, on Reddit:
Finished the last interlude today at 5:21. That is a wrap: Wind and Truth, Book five of the Stormlight Archive, is finished. Tomorrow morning, I'll hand it off to the proofreading and copyediting team. 491k words.
I couldn't be more excited about this book. Still thinking about re-reading books 1-4 later this year. There is so much in this series, it's hard to take it all in with one reading.
When driving directions in maps lets you know there's a much slower route available, every once in a while you should take it.
Glen Canyon.
Micro.blog-hosted blogs have had good uptime the last couple months, but of course two servers crashed while I'm settling in at my campsite in Bryce Canyon National Park. Is this some kind of joke? 🤪 Luckily quite good LTE here tonight.
How to rethink Siri
Siri is much too limited and inconsistent. The only time I ever use Siri is when driving, for responding to text messages and dictating notes. Many people will have different use cases, and so when people say "Siri sucks" they probably all mean different things.
There are many things that could be improved in Siri, but to me it all comes down to just two fundamental shifts:
Universal Siri that works the same across all devices.
The illusion of Siri as a personal assistant is broken when basic tasks that work from your phone don't work from your watch or HomePod. I've long thought and discussed on Core Intuition how Apple has tied Siri too closely to devices and installed apps.
That's not to say that controlling installed apps isn't useful, in the way that Shortcuts and scripting are useful. I expect Apple to have more of that at WWDC next week. But in addition to extending Siri with installed apps, to make it truly universal there should be a way to extend Siri in the cloud, just as Alexa has offered for years.
Standalone devices like the Human AI Pin and Rabbit R1 have been criticized as "that should be an app". While it's true that the iPhone will continue to dominate any potential non-phone competition, I think there is a narrow window where a truly new device could be disruptive to the smartphone if Apple doesn't make Siri more universal and seamless across devices. This universality might sound subtle but I think it's key.
Large language models.
This is obvious. Talking to ChatGPT is so much more advanced and useful than current Siri. With ChatGPT, you can ask all sorts of questions that Siri has no clue about. Sure, LLMs are wrong sometimes, and I'd love for Siri to be uncertain about some answers. If there was a way to have some kind of weighting in the models so that Siri could answer "I'm not sure, but I think…" that would go a long way to dealing with hallucinations. Generative AI is less like a traditional computer and more like a human who has read all the world's information but doesn't really know what to do with any of it. That's okay! But we wouldn't blindly trust everything that human said.
There are many other improvements that would come along with using even medium-sized LLMs on device for Siri, such as dictation. OpenAI's Whisper model is almost 2 years old now and way better than Siri.
Apple is going to talk a lot about privacy and on-device models at WWDC. A dual strategy for LLMs is the way to go, with models on your phone that can do a bunch of tasks, but some kind of smarts to switch gears to using LLMs in the cloud when necessary. I've done a bunch of experiments with open-source LLMs on my own servers, and it requires a lot of RAM and GPU to get reasonable performance. If we use "parameters" as a rough metric for how much horsepower LLMs need, note that Meta's Llama 3 (which is pretty good!) is a 70 billion parameter model. GPT-4 is rumored to be nearly 2 trillion parameters. If Apple can't get GPT-4 level quality and performance on device, they should not hesitate to use the cloud too.
Looking forward to WWDC next week! Should be a good one.
Bryce Canyon National Park. Didn’t catch this view at the right time for the best lighting, but it really is extraordinary. Hiked down and up and now I’m so exhausted just going to chill for the afternoon.
Jason Snell, writing about how Apple may frame the announcements next week at WWDC:
Apple has the chance to depict itself as the adult in the room, a company committed to using AI for features that make its customers’ lives better–not competing to do the best unreproducible magic trick on stage.
It's probably a safe bet that Apple will do all the obvious things with AI: on-device models for developers to use, integration with iWork apps, something with Photos. But it's anyone's guess how far they will actually go, especially with Siri, or potentially even brand new apps.
Honda Element: window rain guards
This is a pretty simple one, another addition to my blog post series about upgrading my Honda Element. These rain guards attach above each window, so you can crack the window while it’s raining and not get water in the car. They just stick on, but they seem surprisingly sturdy.
Catching up on a bunch of random online things as I wait for the day to cool down enough to feel like cooking dinner. Just spent way too much time drafting replies to posts and then not sending them. 🤪 Need to step away from the computer and read a book.
Redis has ballooned in size again, after sizable reductions in memory usage last year to make things more stable. Going to have to tackle this tomorrow, hopefully the primary server will be fine through the night. We do have better redundancy now, but it's not being utilized like it could be.
Great post on TechCrunch by Sarah Perez about Bridgy Fed's support for Bluesky. The social web can be multiple protocols. We need more platforms that embrace the web wherever users are, shifting away from monoculture.
Hiking in the sun yesterday, it was a bit of a wake-up call, especially with the looming heat wave. I had planned to camp at Death Valley tonight, but gonna skip it and detour to Vegas to catch up on work. NBA finals game 1, which might be a dangerous time to be in the gambling capital of the world.
I don't ask Siri hard questions. Today I threw her a softball while driving down I-15, "What time zone is Las Vegas in?" She has no clue. These are the kind of simple problems an LLM will solve.
NBA finals time. Feels like Boston is the more complete team but I have no real idea how this series is going to go. Also congrats to Doris Burke! First woman to be a game analyst in the NBA finals and in fact in any men's championship series in the major US sports. 🏀
Just posted our preview for next week's WWDC, Core Int episode 602. Lots more about AI and other thoughts leading up to the conference.
Whenever I'm anywhere near Los Angeles, I think, "I should stop at Disneyland for the afternoon." It never works out, but I always check maps to be sure. If the Splash Mountain redesign was ready, I'd be especially tempted today. 🏰
Listening to Pivot today, Scott Galloway goes on a tangent about how Hillary Clinton would've been a great president, and how so many things would be different today if as a country we hadn't screwed up the 2016 election. We'll be paying the price for years. Not sure I'll ever fully get over it. 🇺🇸
Made it to the coast, settling in at Point Mugu State Park. Way too crowded, almost left, but I like my tree.
Only just skimmed through Mark Gurman's comprehensive WWDC article. Most of the AI rumors sound about right, but I glossed over a few details so there still might be minor surprises. My most pressing question: can I wait to get a new phone? I like my iPhone 14 Pro.
Each time I take an extended solo road trip, I learn something new about how to make life a little easier. Something that has worked well for me the last couple trips: when it's time to do laundry, drop clothes off at a cleaners or wash-and-fold place. More pricey but saves so much time.
Four hostages freed in assault in Gaza. I wonder how this affects feelings about the war from Israelis. To be honest, and I'm sorry this is dark, but it is a war… I was assuming that most of the remaining hostages were dead.
Nick Heer on the eroding trust in tech as companies plow ahead with new features:
These product introductions all look like hubris. Arrogance, really — recognition of the significant power these corporations wield and the lack of competition they face.
Clearly a big part of WWDC's keynote is going to be about trust and privacy. I don't think things are as bad as many people think — OpenAI's API, for example, doesn't use user data to train models, but I bet most people assume it does. But Apple has built the trust and they should emphasize it.
Working this morning at Honey Cup, a little coffee shop in Thousand Oaks. Finally have uninterrupted time to solve the increasingly slow posting in Micro.blog. Turns out a database index wasn't good enough and hit some kind of tipping point recently, pushed from "fast enough" to "slow". All good! ☕️
I'm experimenting with an OpenAI-based assistant powered by my own writing. It is wild. See this transcript of me testing it with a couple questions. It does hallucinate sometimes, but as a powerful search or just for fun, this could be something.
Santa Barbara from the pier.
Santa Ynez Valley.
Bad Boys opens with $21 million box office Friday, on track for $50 million for the weekend. I've been curious about whether Will Smith gets a second chance. No legal charge was filed at the Oscars, so the punishment was a 10-year Academy ban, which we're a couple years into. Sometimes we don't forgive even after someone has served the time, and I think that's problematic.
Field of Light in Paso Robles. I was expecting something different and beautiful, and it was that. But it also made me wonder if we’re in an era of synthetic art, where ideas matter more than craftsmanship. (This is about AI too.) Glad I stopped here to see it.
I'm always surprised when coffee shop wi-fi is way slower than LTE tethering. It's great the cell networks are usually so fast now, but it feels all backwards (and in a way, kind of wasteful).
Finished reading: Crown of Midnight by Sarah J. Maas. Fast-paced. This was a stronger book than the first one. 📚