Bringing Language Models Closer to the Editor Through Modal Editing

Sunday, August 13th, 2023

Recently, I introduced modal editing to printloop.dev through a feature I call QQ – it stands for "quick question" and is pronounced "cue-cue".

QQ is my proposal for a new editing modality that allows the user make quick consults with a language model that has been fine-tuned for chat within the editor. It offers a frictionless way to ask for explanations, examples, and documentation without interrupting the user's workflow.

Here is a brief demo of QQ in action:

How to Trigger QQ

When the user types qq at the beginning of a line or comment, the editor shows a floating tooltip below the cursor line. The text editor then functions like a text input where the user can type a question like "what does this code do?".

When the user presses enter (or return on macOS), the question is sent to a language model – in the case of the demo above, OpenAI's gpt-3.5-turbo, – which streams back a response to answer the user's question. The response is directly displayed in the tooltip, so the user doesn't have to leave the editor to see the answer.

The user can then choose to either close the conversation by pressing esc or continue sending replies to the model to iterate on the generated responses. When the user closes the tooltip, the edited line is cleared and the editor returns to its regular state.

Modal editing is a way to preserve flow by reducing the number of context switches. It's a way to "stay in the zone" and keep your hands on the keyboard while you're working. With modal editing there's no need to reach for the mouse, or to switch to a different pane to look something up. You will be able to type stuff out, and then ask for help, all without leaving the editor or in many cases switching focus away from the text editor itself.

Presenting the Conversation

In screen readers, the tooltip is announced as a live region, and the user is informed that they are in a conversation with a language model. I still need to do some research on how to best present the conversation to screen readers, but live regions seem to work better than a modal dialog, for example.

The content of the conversation is presented as a list of message bubbles, analogous to a two-way chat conversation between the user and another individual. Like in an instant-messaging application, the user's questions are shown as accent-colored message bubbles on the right, and the responses from the language model are shown as message grayscale bubbles on the left.

When the conversation is empty, the user is presented with an empty state that shows "No messages".

As the user types their question, the dialog shows the user's question as a message bubble, but is presented in an indeterminate state, communicating that the question still has not been sent to the language model.

When the user presses enter, the question is sent and the message bubble's visual state is updated to indicate it. A live announcement is dispatched to the screen reader through ARIA live regions as well, communicating that the message has been sent.

As the response from the language model is received, the response is shown as a message bubble in a similar indeterminate state. When the response is received, the message bubble is updated to show the response.

Responses from the language model are expected to be richer in content and format than the user's questions, so the message bubbles are styled differently to communicate that. Responses from the LLM also support markdown formatting and syntax highlighting of code-blocks to make the content more digestable and reable for the user.

Managing Conversations

The user can close the conversation by pressing esc or enter on an empty line. When the conversation is closed, the tooltip is hidden and the editor returns to its regular state, but the conversation is still available in the editor's state. The user can trigger qq again to reopen the conversation and continue where they left off, or to review the conversation again.

Conversations are grouped by line number, so that the user can have multiple conversations open at the same time. This is useful for when the user is working on multiple lines of code at the same time, and wants to ask questions at different parts of the code.

Prompting

The contents of the active document are sent alongside the user's question to the language model. This is done to provide context to the language model, and to help it generate more relevant responses for the user.

Here is the prompt that is sent to the language model:

As "Loop", you're a programming assistant driven by a language model.
Respect user's instructions strictly, but avoid discussing opinions, existence, sentience, or engage in arguments.
Provide accurate, logical, brief, and Markdown-formatted responses with named language in code blocks.
Respond to technical inquiries with code suggestions only within code blocks.
Account for the user's language in code blocks when using the text editor, Printloop.
Limit responses to one per interaction turn.
---
location: {window.location.origin}{window.location.pathname}
editing document: printloop.md
contents:
# A Simple SQLite Database

Here is a simple SQLite database:

```sqlite!
CREATE TABLE messages (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  text TEXT
);

<- user's cursor is here

INSERT INTO messages (text) VALUES ('Hi!'), ('How are you?');

UPDATE messages SET text = 'Hi there!' WHERE id = 1;

SELECT * FROM messages;
```

I've been using this prompt for a while now, and it's been working well. I'm sure there are ways to improve it. I'm also thinking of adding a way for the user to customize the prompt, or to add their own prompts.

Next Steps

I'm still working on improving the experience of using QQ, and I'm looking forward to hearing feedback from users. My main focus right now is on improving accessibility, and making sure that the conversation is presented in a way that is easy to follow and understand for screen reader users and users with other cognitive and physical disabilities.

I also want to improve the triggering mechanism. Right now, the user has to type qq at the beginning of a line or comment, but I want to make it easier to trigger the conversation. I'm thinking of adding a keyboard shortcut, or a way to trigger the conversation by listening to qq as a series of keystrokes instead of parsing it from the text buffer per se.

Try It!

Type qq in printloop.dev to open the QQ modal. Then, type a question, like "what is a monad?".

# How to Trigger QQ

# Why Modal Editing?

# Presenting the Conversation

# Managing Conversations

# Prompting

# Next Steps

# Try It!