In search of GDK keycodes

Recently I was working on a remote console project that allows managing HPE servers through their iLO module. It’s a wonderful technology: you can see the server’s screen and use the mouse and keyboard, just as if you were in front of the KVM.

I decided to use GTK3 to create the client, and eventually got to the part where I had to capture keyboard input and send it to the iLO.

Now, since iLO is a module attached to the server, it makes our input appear as if it came from a USB mouse & keyboard plugged in to it. This means iLO doesn’t expect characters (as SSH or telnet would), but instead low-level messages ―the kind that a USB keyboard would send.

Specifically, it needs us to know when physical keys are pressed or released. This is where keycodes come in.

Keycodes, symbols and characters

Keycodes are numbers that identify a physical key on a keyboard. They are sometimes called scan codes, because they were originally codes that the keyboard’s microcontroller sent when scanning keys to signal press or release events.

There is a keycode for Left Shift, another for Right Shift, every distinct piece you can press should have one. Referring to physical keys can be tricky: the Z letter on a German keyboard layout is located at the same key than the Y in an English layout. They’re the same keycode. And this is just one example.

German keyboard layout "T1" (according to DIN 2137-1:2012-06) United States ANSI keyboard layout

There’s also separate keycodes for keys on the numeric keypad, and special keys present on some keyboards like a “Volume Up” key, an “Internet Browser” key and so on.

Keycodes are low-level, and thus are the only thing that the keyboard electronics need to worry about. The rest generally happens in software. First, the keycodes will be matched to key symbols depending on:

For instance, a keycode for the “1” key could result in the ! symbol with a Spanish layout and a Shift modifier present.

Key symbols capture the logical meaning of the keypress, but some more steps are (usually) needed to get characters, such as handling composition (pressing many symbols, such as a grave + e, to get è), keyboard modes (used on some languages such as hebrew), and so on.

The whole set of tasks that translate keycodes to text are usually performed by the input method.

Taboos

If it’s already tricky enough to refer to a physical key, the absence of a common standard for keycodes only makes it worse. Unlike characters, where we’ve all (finally) agreed on Unicode, every thing that has to deal with keycodes usually has their own mapping of physical keys to numbers.

There’s a long chain of players from keyboard to the final application that receives the input, and trouble comes when one of them doesn’t really tell you what their keycodes are, just how to map them to symbols. The chain breaks and ‘keycode’ suddenly becomes opaque to you, even though in practice it has well known values.

Thus, despite final APIs generally do usually give you access to the keycode, it will often be implementation dependent or (with some clear exceptions) hardly mentioned in the docs. Keycode values seem to be a necessity that no one formally acknowledges ―and often can’t acknowledge, because upstream doesn’t either― but at the same time many rely on, and we secretely keep stable. We don’t talk about it, yet we seem to act like it’s common knowledge.

To reveal what the ‘opaque’ value actually holds in practice, we need to follow the chain. Let’s look at (some of) these players, in order.

Hardware

First we have the keyboard itself. Most keyboards today are USB, and fortunately the USB HID spec defines a table of keycode mappings, which can be found here at chapter 10:

Start of the USB HID keycode table

So we’re good to go, I won’t spend time looking into old hardware (here).

By looking at the header and the amount of footnotes of the table, you can already get an idea of how tricky keycode definitions can be. Also note that, while the table mentions values up to 0xFFFF, only values up to 0xFF can be used in practice, since the keycodes are 8 bits long on the wire (at least in the boot protocol). This limitation exists in more places and poses a problem.

If you need a handier version of the table and/or its data in JSON format, I have uploaded it here.

Linux

Then we have the kernel. Kernels are expected to offer a normalized interface to the hardware, i.e. it should be independent of the specific brand / model and the bus used. So it shouldn’t come as a surprise that the kernel normalizes the received keycodes into its own keycode definitions.

Here we’ll focus on Linux. Linux has several interfaces for userspace applications to choose from. The most modern one, evdev, does almost no ‘cooking’ (keymap substitution, etc.) to the data and just passes it along. Always use this interface; keymaps and further processing belong in userspace. A (somewhat dated) overview of the input subsystem can be found here.

Some digging through its source code reveals the table of evdev keycodes, in linux/input-event-codes.h (previously at linux/input.h). Even though kernels come from the PS/2 days, evdev is more modern and according to the comments, the codes seem to be mostly based around USB HID:

/*
 * Keys and buttons
 *
 * Most of the keys/buttons are modeled after USB HUT 1.12
 * (see http://www.usb.org/developers/hidpage).
 * Abbreviations in the comments:
 * AC - Application Control
 * AL - Application Launch Button
 * SC - System Control
 */

#define KEY_RESERVED		0
#define KEY_ESC			1
#define KEY_1			2
#define KEY_2			3
#define KEY_3			4
#define KEY_4			5
#define KEY_5			6
[...]

Note that there are also keycodes allocated above 0xFF, and the codes are shared with other things (i.e. mouse / touchscreen buttons).

We also find the driver for USB HID keyboards. And sure enough, there’s a translation table from USB HID keycodes into these Linux keycodes:

static const unsigned char usb_kbd_keycode[256] = {
      0,  0,  0,  0, 30, 48, 46, 32, 18, 33, 34, 35, 23, 36, 37, 38,
     50, 49, 24, 25, 16, 19, 31, 20, 22, 47, 17, 45, 21, 44,  2,  3,
      4,  5,  6,  7,  8,  9, 10, 11, 28,  1, 14, 15, 57, 12, 13, 26,
     27, 43, 43, 39, 40, 41, 51, 52, 53, 58, 59, 60, 61, 62, 63, 64,
     65, 66, 67, 68, 87, 88, 99, 70,119,110,102,104,111,107,109,106,
    105,108,103, 69, 98, 55, 74, 78, 96, 79, 80, 81, 75, 76, 77, 71,
     72, 73, 82, 83, 86,127,116,117,183,184,185,186,187,188,189,190,
    191,192,193,194,134,138,130,132,128,129,131,137,133,135,136,113,
    115,114,  0,  0,  0,121,  0, 89, 93,124, 92, 94, 95,  0,  0,  0,
    122,123, 90, 91, 85,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
      0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
      0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
      0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
      0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
     29, 42, 56,125, 97, 54,100,126,164,166,165,163,161,115,114,113,
    150,158,159,128,136,177,178,176,142,152,173,140
};

Some HID keycodes aren’t mapped to anything (0). But is this mapping reversible? Well, almost. Both 0x31 (Keyboard \ and |) and 0x32 (Keyboard Non-US # and ˜) map to Linux keycode 43. Indeed, 0x32 comes with a footnote that says:

Typical language mappings: US: \| Belg: µ`£ French Canadian: <}> […]

The rest of the duplicate mappings are because of reserved HID keycodes: both 0x78 (Keyboard Stop) and 0xF3 (reserved) map into Linux keycode 128. I suppose the standard keycodes were assigned later, when some keyboards had started using non-standard ones, but we should prefer the standard ones.

So yes, we can mostly reverse the mapping; I’ve uploaded it here.

Drivers for older keyboards, like AT or PS/2, also have their own tables.

X11

But if you’re using a graphical application, it probably won’t talk to the kernel directly but to a graphical server that arbitrates access to the screen and input between multiple apps. In Linux this is (or used to be) almost always X11.

X11 didn’t have an in-depth understanding of keyboard events; the server barely worked with keycode values and just passed them around. The core protocol basically tells client apps to treat keycode values as opaque and server-dependent, with the server providing a keycode-to-symbol mapping for clients to use:

A KEYCODE represents a physical (or logical) key. Keycodes lie in the inclusive range [8,255]. A keycode value carries no intrinsic information, although server implementors may attempt to encode geometry information (for example, matrix) to be interpreted in a server-dependent fashion.

A KEYSYM is an encoding of a symbol on the cap of a key. […] A list of KEYSYMs is associated with each KEYCODE. The list is intended to convey the set of symbols on the corresponding key. […]

The mapping between KEYCODEs and KEYSYMs is not used directly by the server; it is merely stored for reading and writing by clients. […]

Not all keycodes in this range are required to have corresponding keys.

People don’t seem to know the reason for that 8 offset; I speculate that it might have to do with the eight keycodes for mouse buttons (Linux allocates these starting at 0x100, past the 8-bit limit).

To get something more flexible and closer to a proper input method, the Xinput (Xi) and X keyboard (Xkb) extensions were introduced. Today, Xkb or the more modern Xinput 2 (Xi2) are used in almost all applications. Using the xev tool we can see how X events look like, in particular key events:

KeyPress event, serial 37, synthetic NO, window 0x2200001,
    root 0x2a8, subw 0x0, time 811738218, (168,103), root:(218,217),
    state 0x0, keycode 62 (keysym 0xffe2, Shift_R), same_screen YES,
    XLookupString gives 0 bytes: 
    XmbLookupString gives 0 bytes: 
    XFilterEvent returns: False

The X11 server has pluggable input drivers, and the driver you’re using seems to be what decides the keycodes and their mappings to keysyms. The modern driver for Linux is xorg-xf86-input-evdev, which uses (you guessed it) the evdev interface. We can see that, right from the first committed version of the driver, it just takes the Linux keycode and adds MIN_KEYCODE to it, which is 8 as we saw earlier:

xf86PostKeyboardEvent(pInfo->dev, ev.code + MIN_KEYCODE, value);

Sure enough, X keycode 62 is Linux keycode 54, which corresponds to Right Shift.

X11 (continued)

However if we look at the older keyboard driver xorg-xf86-input-keyboard, we see a table of keycodes based around PC/AT scancodes. PC/AT keyboards were by far the most common kind in the PS/2 days (and the PC/AT keyboard standard was still popular until not so long ago). These are the codes that the microcontroller in the keyboard sends to the PC (and are different on press & release, but we only care about the first ones).

PC/AT scancodes are also used by the old (pre-evdev) kernel interface, and by other kernels. Again, the driver adds MIN_KEYCODE before sending them to X11. You can read more about the scancodes and the old Linux interface here.

This driver is for non-Linux platforms, but when evdev wasn’t around, it was used on Linux as well. Around late 2019 they removed Linux support to ensure people moved to evdev or libinput.

Wayland

X11 is now getting old and things are moving to Wayland. So how does the Wayland protocol handle things compared to X? Pretty much the same way: the server passes keycodes around and it’s the client responsibility to map them to symbols using the server-provided mapping, implement key repeat and such. The Wayland book recommends using xkbcommon (a library containing the client-side logic of the Xkb extension) to do the conversion.

What does the protocol say about the keycodes, though? Are they still opaque? Well, the protocol definition says:

The key is a platform-specific key code that can be interpreted by feeding it to the keyboard mapping (see the keymap event).

Yeah, keycodes are still opaque values to the client, and the compositor you’re using could (in theory) place any values there. However the book mentions:

Important: the scancode from this event is the Linux evdev scancode. To translate this to an XKB scancode, you must add 8 to the evdev scancode.

So that’s something.

libinput

There’s another component here. The input handling logic from Weston (Wayland’s reference compositor) was moved to a separate library, libinput, which can also be used as an X11 input driver (a bit more modern replacement for xorg-xf86-input-evdev). While you technically could write a Wayland compositor that doesn’t use libinput, all I know do.

As a surprise to no one, libinput’s documentation seems to make no mention at all on what keycodes are, just:

The keycode that triggered this key event

libinput passes keycodes straight from evdev, and while theoretically it could have more backends for keyboards in the future, I think it’s safe to assume libinput keycodes will always be Linux keycodes.

The X11 driver, xorg-xf86-input-libinput, just adds 8 to the libinput keycodes as usual.

evdev is from 2001. Keeping all of this in mind, the rule is: if you are talking to a Linux graphical server, then (unless it is a special, ancient or misconfigured X server) you can assume Linux/evdev keycodes (offset by 8 in the case of X).

GUI toolkit

But your app will likely not want to paint the components itself, or handle accessibility, or integration with the desktop environment, so it will typically use a toolkit. Popular toolkits are Qt, GTK and wxWidgets. These are cross-platform, but some OSes have their own (MacOS, Windows, Android). Some of them, like GTK, are cross platform but designed for a desktop environment (GNOME) where they integrate better.

GUI toolkits usually have several modules to handle text input (integrate with the input method, in order to draw visual hints and other requirements). This doesn’t concern us much, since we only care about keycodes, but it’s nice to keep in mind.

GTK / GDK

This is my case. GTK3 (and GDK3, which is an abstraction layer for the graphical server, and thus where events originate) expose keycodes through the hardware_keycode field in GdkKeyEvent, which is described as:

the raw code of the key that was pressed or released

As usual, no mention of what is actually inside the field. Is it opaque, or just poorly documented? Who knows. Before we can begin searching, we need to know that hardware_keycode was renamed to just keycode in GDK4. Another change is that fields can now only be accessed through methods, in this case gdk_key_event_get_keycode. Some other unrelated changes were also made to key events.

A quick glance through the code for the X11 implementation of GDK shows that the received keycode is passed untouched. The Wayland implementation adds 8 to the keycode to mimic X11:

deliver_key_event (data, time, key + 8, state_w, FALSE);

In fact this was there from the very initial version, so it seems hardware_keycode is not opaque in practice. That’s nice of them.

Putting everything together again: when your GTK/GDK application runs on Linux, then (unless it is a special or misconfigured X server) you can assume Linux/evdev keycodes plus 8.

HTML

HTML has good support for keyboard input at various levels. Physical key events can be handled through the keydown and keyup events, but key repeats from the input method generate synthetic keydown events with the repeat property set.

These events used to have a keyCode attribute which was implementation dependent, and was later removed and replaced with the code attribute. The symbol approximation of the key can be read through the key attribute. Both hold string identifiers that are normalized across platforms, which is really nice.