Skip to content

My OpenAI DevDay 2023 Wishlist

OpenAI's DevDay is coming up on November 6, 2023 and I'm excited to see what they announce.

Apparently they are not announcing new foundation models (i.e. GPT-5), but are instead going to focus on new features and tools for their APIs.

This is my wishlist, categorized in three buckets: Likely Suspects, Big Moves, Moonshots.

Likely Suspects

General Performance Improvements

This one feels obvious to me, faster gpt-4, it's a no-brainer. Faster time-to-first-token, faster inference times, faster everything. GPT-4 is very slow. It's still usable with streaming, but it's too slow.

I'm not sure if this will be in the form of a "distilled" version of existing models, a completely new model, or just general infrastructure improvements across the board. Will gpt-4 just be faster or will we see a gpt-4-turbo?

Maybe they will also release an even faster gpt-3.5-turbo?

GPT-4 Fine-Tuning API

This one is also a no-brainer. We already have the fine-tuning API for gpt-3.5-turbo, so it's only natural that we get one for gpt-4. Right? Right!?

I'm interested in how the economics are going to look like, but the general idea is that you can fine-tune gpt-4 to your specific domain or use case and get better results using fewer tokens, which generally means it's going to be cheaper for specific cases.

Public DALL·E 3 API

We already have an API for DALL·E 2, of course they're working on making DALL·E 3 public! In fact, they already said it was going to be released late fall when they publicly announced it.

What I'm really wondering is the pricing, though. Will it be the same as DALL·E 2? Will it be more expensive? Will it be cheaper because they achieved some sort of economies-of-scale win? I'm not sure, but I'm excited to find out.

Multimodal API

GPT-V, as OpenAI has been marketing it, is already making its way into ChatGPT and Bing, so it is possible that they bring it to the API.

I'm not sure if it will be a completely new model or just a modification to the API that allows sending images and text together.

I'm curious about how the API would look like and how the pricing would work. Are images encoded as base64 strings? Are they sent as URLs? Are they sent as binary blobs? Are they sent as a combination of all of the above?

Playground Improvements

I'm hoping for some improvements to the playground, especially if multimodality comes through. I'd like to see a token counter, support for function calling, and maybe even templating?

Big Moves

Ability to Pause, Insert Into, and Resume Inferences

I don't know of a clever way to call this. But I would like to have longer-lived inferences that can span multiple requests.

Edit You might be thinking, "oh so you want a stateful API?". Short answer is yes, but that is an implementation detail. I just want the ability to have longer-lived inferences. I don't care how this is achieved, I just want it to be possible.

A general idea is that you could provide some sort of token to pause the inference, freeze the state of the infrerence in place, provide more context to bias the rest of the inference, and then resume it.

If possible, maybe even save in token costs along the way?

llama.cpp has a similar feature, where you can pause the inference at any time, inject some text, and then resume the inference as if the model produced the text you injected.

This would open the door for one-shot reACT loops. Fewer tokens, but longer inference lifespan. I think it'd be worth it.

We'd be able to do things like:

Sure, I'll unsubscribe you from the
/unsubscribe(User)⏸️Successfully unsubscribed User⏩️Done!

The ability to pause, insert into, and resume the inference.

The ability to pause... are you getting it? These aren't three separate inferences, it's one single inference that can be paused, inserted into, and resumed.


Output Templates

It would be great to be able to provide a template to the API and have the LLMs exclusively sample tokens in a way that the full output conforms to it. This would be amazing for things like biasing the model to output code, JSON, HTML, etc.

A good medium to allow this feature would be through the usage of a grammar file. llama.cpp, for example, supports this feature.

Regular Updates to Training Data

What if OpenAI had a regular cadence of releasing new models with updated information about the outside world (e.g. current events, new people, new places, new things, etc.)?

More Deterministic Models

I know, I know... LLMs are inherently stochastic, but hear me out. I have a more practical solution to this problem.

I am under the impression that OpenAI's models are not frozen in time, that they are constantly being updated and retrained. This means that the same input prompt can produce different outputs over time.

OpenAI kinda, sorta versions their models, but it is widely understood that they are making changes under the hood all the time.

Here's a comprehensive list of things that could be altering the output of the model:

  • Adding new training data
  • Changing the model architecture
  • Modifying the hyperparameters
  • Altering the training process
  • Applying learnings from RLHF
  • Updating the tokenization process
  • Scaling the API depending on load
  • Changing how the API is deployed

It must be happening, no?

I'd like to see them release frozen models that are guaranteed to produce similar output for the same input prompt (as much as its stochastic nature allows).

Or, if they already guarantee this inherently, I would like to see them make it more clear.

Built-In Code Interpreter API

What if the API had a built-in runtime to execute code for popular programming languages (Python, JavaScript, TypeScript)? It would be amazing to have the model run the code for you in a sandboxed environment and use the output.

Here's an imaginary transcript of the prompt and response:

    "role": "user",
    "content": "Give me a random number from 1 to 10"
    "role": "assistant",
    "to": "javascript",
    "content": "Math.floor(Math.random() * 10) + 1"
    "role": "interpreter",
    "name": "javascript",
    "content": "3"
    "role": "assistant",
    "content": "How about 3?"

How cool would that be?