Push Button, Generate Audiobook: The AI Storyteller

Pulp Sci-Fi for Bedtime

I like reading pulp sci-fi before I sleep. There's a Ballmer Peak equivalent of interestingness I enjoy for that—a certain middling level of sophistication for bedtime stories. If they're too poorly-written, I'm taken out of the story. If they're too interesting, they prevent me from falling asleep.

So, every so often, I'll find an author that suits, and I'll pick up all of their works until I run out, then I'll have to find someone else. Rinse, repeat. What I really want is a Bach Faucet of pulp sci-fi to help me nod off. An audiobook would be even better! So, an experiment:

Can I create a tool where I can push a button and get a decent sci-fi short story audiobook out?

Let’s find out!

ChatGPT is a Naturally Bad Writer

If you're reading this, you probably already know that (as of December of 2023), ChatGPT is generally a poor storyteller. If you ask it, point-blank, to write you a short, it'll produce something trite and saccharine, with an AI smell to it. It even knows that it writes junk. I asked it to write a story and then critique it, and it was pretty spot-on:

Lack of Originality and Depth in Concept: Failing to provide a unique or fresh perspective on familiar sci-fi themes, and not exploring the deeper implications or ethical questions raised by the story's premise.
Underdeveloped Characters: Creating characters, including protagonists and supporting roles, that lack depth, personality, or relatability, making them forgettable or mere plot devices rather than integral parts of the story.
Weak World-Building: Constructing a generic or poorly detailed setting that lacks imagination and specificity, failing to immerse the reader in a believable and intriguing world.
Predictable Plot and Uneven Pacing: Crafting a story that is predictable, lacks conflict or tension, and does not effectively manage the pacing to keep the reader engaged or invested in the outcome.
Heavy-Handed Theming and Lack of Subtlety: Over-explaining or overtly stating themes and messages, thereby undermining the story’s ability to provoke thought or allow readers to draw their own conclusions. This often includes telling rather than showing, which can diminish the reader's experience and engagement.

ChatGPT knows it can’t write well!

So, how can we use its own knowledge to improve it?

Making ChatGPT a Better Storyteller

I ultimately want to use gpt-4 via API, but it's often handy to prototype with ChatGPT first. It's well-known that these tools produce better output when you first ask it to think about the whole process and then embarking on the actual creation.

So, for any X thing you want it to generate, you can improve its output by doing this:

Ask it how to structure an X in general.
Ask it to outline the specific X you're looking for.
Have it use that info to then flesh out the X.

For a short story that's good enough to fall asleep to, I might first ask it this:

Please outline for me the tenets of an award-winning sci-fi short story.

I could provide the rules, myself, but I’m a lazy human, and ChatGPT already knows the answer. It’s not even important that I actually read the results—I just want it to generate a set of steps it can refer back to. LLMs know a ton of stuff, but often need a nudge to bring this into their working memory. Following that, I ask it to ideate:

Can you please list for me about 10 INCREASINGLY INTERESTING ideas for a story in that vein? Make each one MORE INTERESTING AND UNUSUAL than the previous. AVOID OVERUSED SCI-FI TROPES.

The first idea might be a dud, but by the 10th, it may have generated something interesting. Since it's often better at identifying good and bad things than writing them, I can ask it to pick one of the ideas (again, without my direct involvement) and outline a story based on that:

Pick the most interesting of those, then outline a short sci-fi story that will be in 5 parts, according to all the tenets you've come up with so far. Put a twist into part 5, and make sure it's earned and transformative.

And then I just ask it to create each part:

Please write Part 1. This should be about 1,000 words long. Don't include titles or headers.

Rinse, repeat, and it generates a story.

Again, if you're familiar with this stuff, you’ll there's room for optimization in there—instead of asking it to "outline for me the tenets of an award-winning sci-fi short story," I could provide the rules explicitly. The neat thing about this approach, though, is that I can specify a new genre ("outline for me the tenets of a rollicking 19th century adventure"), and it’ll build everything my having to do anything else.

What Narrator?

I often use Amazon Polly, Eleven Labs, and OpenAI's text-to-speech. I did a deep dive into all of their voices in an earlier post (50 AI Voices: Which Reigns Supreme?), but TL;DR:

Amazon Polly is quick and affordable, but lacks expressive depth:

Eleven Labs offers richer voices and a bunch of emotion. Here are some of my favorites:

However, it tends to be significantly more expensive than Polly.

In true Goldilocks fashion, OpenAI's Fable voice balances natural expressiveness, speed, and cost:

I’m not sure what OpenAI was going for here, but the Fable voice strikes me like a robot Daniel Radcliffe, and I’m down with that.

Push Button, Automate Everything

So, let's automate everything! GPT-4 for rules/writing, OpenAI TTS for narration, and Python to connect it all. Since this is a prototype, we're just going to simulate the back-and-forth of the above ChatGPT conversation by glomming user inputs and GPT responses together into an increasingly large prompt katamari. This is suboptimal (for speed, cost, and context length reasons), but it's a quick way to start and debug things. Building up the conversation up is pretty straightforward, with something like this each time:

messages += [{"role": "user", "content": prompt}]

response = gpt_4_generate(messages, temperature=1.0, cache=True)

messages += [{"role": "assistant", "content": response}]

A higher temperature (ideally) yields more varied results, and I typically wrap GPT-4 generation in a function to cache responses if something goes wrong mid-process. The whole thing is then just a bunch of canned queries:

Hello! I hope you are doing well. Please outline for me the tenets of an award-winning {STORY_TYPE}.
Fabulous. Next, can you outline what's great about a {STORY_TYPE} with a {TWIST_TYPE}?
Great, thank you. Please outline for me the structure of a {ACTS_DESCRIPTION} short story.
Perfect. Can you please list for me about 10 INCREASINGLY INTERESTING ideas for a story about {STORY_SUBJECT} in that vein? Make each one MORE INTERESTING AND UNUSUAL than the previous. AVOID OVERUSED {STORY_TYPE} TROPES.
Fantastic! Pick one of the 3 last ones of those, then please outline a {STORY_TYPE} that will be about {STORY_SUBJECT}, in 5 parts, according to all the tenets you've come up with so far. Put a {TWIST_TYPE} into part 5, and make sure it's earned and transformative.
Okay. Next, please come up with a title for this. Make it distinct and interesting. Please just reply with the title and nothing else.
Thank you. Please write Part 1. This should be about 1,000 words long. Don't include titles or headers.
Thank you. Please write Part 2. Don't include titles or headers.
Thank you. Please write Part 3. Don't include titles or headers.
Thank you. Please write Part 4. Don't include titles or headers.
Thank you! Please conclude with Part 5. Make this can be succinct and impactful so that it doesn't feel drawn-out. Don't include titles or headers.

There's been research that speaking to GPT-4 nicely yields better results, ergo the "please"s and "thank-you"s above. I’d like to do some testing to make sure I’m not cargo cult prompt engineering, but it’ll do for the prototype.

The nice thing about this canned conversation is that I can simply change something like STORY_TYPE (which will typically be a genre), and it'll automatically come up with a structure for that on its own, then follow that structure to generate the narrative. TTS APIs are incredibly straightforward these days (using tts-1-hd and fable with OpenAI):

with open("my_credentials_here.txt", "rt") as f:
client = OpenAI(api_key=f.read())
response = client.audio.speech.create(
model="tts-1-hd",
voice="fable",
input=whatever
)

On average, the total cost for generating a complete story currently stands at approximately $0.45, and narration adds an additional $0.25, for a grand total of...

70 cents for a narrated audiobook short

The Results

After a few minutes (and seventy cents), the tool creates this:

This is perfect! While it isn’t a great narrative delivered by an expert VO actor, both the writing and narrative are miles better than what was possible even a year ago.

And for my purpose, that’s just fine—I just want something to help me get to sleep.