At its core, a TSR is a program that implements a new interrupt handler in a machine running MS-DOS by hooking a target interrupt or interrupts, installing a new interrupt service routine for that target interrupt, optionally adding an entry to an unused slot in the interrupt vector table, redirecting the control flow of that original interrupt (or those original interrupts) to the newly created TSR (more specifically to that TSR’s ISR), and then concluding with a jump to the original interrupt’s ISR. It is similar to the process of hooking an interrupt in modern systems (i.e. hooking a syscall in Linux, or hooking a low-level WinAPI function in Windows, etc.).
The definition above was rather verbose and convoluted. I assure you, the process of creating a functioning TSR is significantly less so.
In fact, the process is fairly straightforward.
In order to install a TSR, one had to modify several components of the Interrupt Vector Table, which was the precursor to the Interrupt Descriptor Table, and that defined the addresses of all of the 256 interrupts in 8086 real-mode.
The basic formula went as follows: find the address of a desired interrupt in the IVT, save that address to a specific address (i.e. a variable in the data segment or an unused area of memory), install the new interrupt handler which includes ensuring that the IVT entry is pointing to the newly installed interrupt service routine, and then concluding by making a final jump back to the original ISR of the target interrupt.
Let’s break this down:
Refer to the diagram below for an illustrated guide to the IVT:
There were several notable MS-DOS functions under INT 21h designed for the express purpose of implementing a TSR, but I’ve noted the most commonly used and well-known:
→ Get Interrupt Vector (35h), used to retrieve address of existing Interrupt Handler, to be restored after TSR has finished executing (or when it is no longer resident in memory)
→ Set Interrupt Vector (25h) used to install the new TSR-specific interrupt handlers
→ Keep Program function (INT 21h subfunction 31h)
***
We can now start to put this together to implement a basic TSR.
Baby’s first TSR! This is an exciting moment. Get the camera. Get the scrapbook. Let’s record and remember this historic occasion.
First we retrieve the address of the ISR for our target interrupt
Amazing.
We’ve retrieved the address (segment and offset) of our target interrupt… now what do we do?
***
One technique for installing a TSR (particularly a TSR that hooks an existing interrupt, ideally an interrupt that is frequently triggered during normal system operation) is to essentially cause the system to take a detour during its usual code execution:
Let’s pretend that we want to install a TSR that hooks INT 21h (MS-DOS functions bb ♥).
Normally, when this interrupt is triggered, the code flow looks like this:
System invokes INT 21h > go to entry of INT 21h in IVT > retrieve the segment:offset address stored in the IVT[21h] (which corresponds to the location of the ISR in memory), jump to the address of ISR for IVT[21h], execute code at that address
INT 21h > IVT[21h] > ISR[21h] > return to calling function after execution of INT 21h ISR
What we want to do is create a detour between those last two nodes so that the target interrupt is hooked and is able to continue performing its normal operation after the newly added interrupt has finished executing its ISR.
There are a few ways that we could go about this, but let’s start with the most basic:
We create a new Interrupt and install it in an unused slot in the IVT.
It’s important to note here that interrupts above 0x80 are frequently undefined. Which means that there is a plethora of space (by MS-DOS standards) for adding new entries to the IVT that point to our *not malicious rly sweet and innocent doing nothing bad at all no way no how* TSR.
Normally you’d want to implement some time of basic check to ensure that somehow the IVT entry you’ve chosen isn’t already occupied. I leave that as an exercise for the reader. For now, we are going to be operating under the assumption that our chosen IVT entry (0x133, because it was the closest available to 1337) does not already point to an associated ISR and that our target IVT, 0x133, is a free slot in memory for us to add a pointer to our own ISR.
As noted above, one critical part of the functionality of the newly created interrupt’s ISR will be that, upon the conclusion of its own routine, it jumps to the address of the original hooked interrupt. This becomes especially important when we hook important interrupts for proper OS functionality, like the system timer, so as to prevent the system from crashing.
So our modified execution flow will look like this:
INT 21h > IVT [21h] >IVT [133h] > ISR [133h]> ISR[21h] > return to calling function after execution of INT 21h ISR
Now, we can abstract this a bit to achieve a model with a bit more generality.
The flow is essentially:
system call to target interrupt > IVT[{target_interrupt}] > jump to location of TSR, whose address is stored in IVT[{target_interrupt}] > execute TSR routine (with optional checks for executing based on specific subfunction calls > jump back to saved address of original ISR of the target interrupt > return to calling function that made the original system call
I made you this pretty diagram to clarify the process.
We want to essentially use a free area of memory that is accessible to our TSR (i.e. a free slot in the IVT, a defined variable in the "data" segment of our TSR (COM programs don't really care about segments because a COM program views everything as belonging to the same segment, that segment being CS, or the Code Segment. One can explicitly define different segments in a COM program, but the file format itself doesn't necessitate it, and often it's a stylistic choice. It's still super heckin' valid uwu (as long as you don't mess it up) so, for the sake of using some of the vocabulary of asm programming which is still applicable here, I am using that phrasing: store the ISR address segment and offset in two variables in the .data segment!)) as a conduit between these two pieces.
That way, we can bridge the gap between our newly inserted functionality of our new Interrupt, and the original functionality of the hooked interrupt. This introduces a delay between the invocation of our “desired” interrupt’s ISR and its execution, because the “desired” ISR is now being executed after our malicious Interrupt’s ISR.
Maybe significant lag could be an IOC…. who knows
Now, this is the standard technique for installing a TSR.
If you wanted to write a memory-resident DOS virus however, then you would need to get creative.
Let’s recall that the IVT is structured as a table of entries (key:value pairs one could say, where key=Interrupt # and entry is the address of the function/routine that will be invoked upon calling the TSR, this is known as the ISR or Interrupt Service Routine; each entry in the IVT is a pointer that points to the interrupt’s ISR).
To retrieve the address of a specific interrupt’s ISR from the IVT, we could use a built-in DOS function (Int 21h, 35h), but we want to avoid making unnecessary calls to INT 21h functions whenever possible. This is because that many common AV programs will monitor a system for these calls.
So what are the other options?
Well, for starters we can note that the addresses of all 256 interrupts in the IVT are reserved in the first 1024 bytes of memory. Yeah but 1024 bytes where, that could be anywhere? What page is it?
Oh honey, remember this is real mode. This is the physical address range we’re talking about.
And it’s in the first 1024 bytes of memory, every time.
Each entry is allocated 4 bytes. So interrupt 0 is at address 0x0h (technically 0000:0000h), and interrupt 1 is at address 0x4 (again, technically 0000:0004h), and so on.
So if you wanted to retrieve the address of say Interrupt 21h, for example (don’t worry about why, that is of no importance at this moment), then you could do the classic maneuver: call INT 21h 35h with al set to 21h.
Or you could simply retrieve the value that is at address 0x84 (21x4, remember that 4 byte step between consecutive interrupt values).
So this function:
becomes:
In the above snippet, we can see that the author of Tequila retrieved the addresses of 2 interrupts: INT 21h (at offset 0000:0084) and INT 1Ch (at offset 0000:0070h). These retrieved addresses are then stored to variables within the Tequila program, located at offset 0x09B0 (and 0x09B2, 0x09B4, and 0x09B6) from the start of the program. Basically, this part of Tequila saves the addresses of the two target interrupts to four local variables (2 variables for each address' segment and offset respectively), to be referenced later.
Now, using this technique, we can just retrieve these values of the segment:offset pair of the target interrupt’s ISR, and save that address (or address components, isn’t segmented memory architecture fun?) to a designated space in memory for calling/jumping to later.
***
The next logical step is to write a routine for our TSR — this is the actual payload which will become the ISR for our target interrupt.
You can fill this in with whatever you’d like. I’ve used a graphical payload because *I went to art school* but you can print a string to the console or whatever else your heart desires.
For now, I’ll call this routine tsr_hook_int
Next, we have to set the entry in the interrupt vector table of our target interrupt (again, INT 21h) to point to our new TSR. This is effectively ensuring that we change from IVT[21h] == address of INT 21h ISR to IVT[21h] == address of TSR
***
And then finally, we make our call to Int21h, 31h to install the TSR:
***
Let’s call it a day for implementing our 1337 TSR — by which I mean our 1337 bb TSR.
We've covered the fundamentals of writing a TSR and learned a couple tricks for making the TSR a bit more stealthy. I've gone ahead and finished up this demo TSR for us (this is my Ina Garten moment, where I am retrieving a second pear clafoutis from the fridge, that I made before the cameras even started rolling. Ina Garten -- stealth queen, TSR inspo, culinary icon.)
There’s a lot more to discuss in terms of vx stealth and persistence techniques for TSRs, but this will provide us a veritable amuse-bouche for the feast of 1337 h4x on the menu.
[How many more times do you think she’s going to use 1337 in this blog post? Sounds like a fun math problem for you to solve, dear reader. lmk if you need to borrow my calculator.]
***
Below is a complete working example of a demo TSR I wrote that hooks INT 21h and only triggers when a call is made to the EXEC program (aka when a user launches a program from COMMAND.COM). If a call to INT 21h is made with any other sub-function call, then the TSR just redirects back to the original saved INT 21h ISR.
Otherwise, a user is greeted with a screen in 256 VGA Graphics mode and a modified colour palette that results in a terminal aesthetic that is reminiscent of that of a Commodore64.
This demo TSR doesn’t implement file infection routines or really anything fun. It provides a template for understanding how to modify control flow of critical system calls and how to use a TSR as a means of more persistent storage of a payload.
It is relatively harmless and it is certainly annoying. It is v on brand in terms of the programming paradigms of a large portion of simple TSR vx samples of the era.
***
Here is a demo video of that TSR in action.
Okay, so installing an interrupt in the IVT is great, but by now, dear reader you may be asking yourself “that’s all well and good, but of course this type of technique wouldn’t be useful or applicable in subsequent OSes, i.e. Windows9x, WindowsNT, or hell, even modern Windows 11, right?”
Here is where the plot thickens.
You’re right in noticing that the IVT is an irrelevant data structure on 32-bit systems. As I noted in the intro, the IVT was the precursor to the IDT (Interrupt Descriptor Table, for those of you who feel so inclined as to ask for a refresher on the acronym). The IDT uses 8 byte blocks for each of the 256 entries in the IDT. Like many things on subsequent 32-bit systems, I would argue that there is a lot of fluff added. However the underlying structure and format remains relatively the same. And, more importantly, the use of the IDT in subsequent versions of Windows is also very similar.
So would the IDT even be a reasonable candidate region of memory for storing a malicious payload?
Just ask Rovnix, a cute lil bootkit from 2011 that hides its payload in the upper region of the IDT (upper being defined as higher memory addresses, specifically in the block of memory that spans Interrupts 0x80 and above). Not only is this technique useful for the bootkit in terms of stealth and persistence, but leveraging the IDT allows it to persist during the processor switch from real mode to protected mode on Win32 during the boot process. What a baddie. What a legend.
[I won’t be digging into Rovnix in this blog post, but it is in the pipeline for future posts. In the meantime, if you’re interested in learning more about this bootkit, I refer you to the excellent analysis of it covered in "Rootkits and Bootkits: Reversing Modern Malware and Next Generation Threats”.]
In the same way that the IVT was a data structure leveraged for persistent storage of malicious code in DOS-era viruses, so too was the IDT leveraged in the later Windows threat landscape.
“Different names for the same thing” isn’t a very robust anti-virus strategy— it’s a Death Cab for Cutie track and if we’re using the discography of early mid-2000s emo ballads as an EDR solution, we might as well just call this revitalized old-school vx technique by a more apt descriptor: “I will follow you into the dark.”
"Advanced MS-DOS Programming,” Ray Duncan, Microsoft Press, 1986
“Microsoft MS-DOS Programmer’s Reference,” Microsoft Corporation, 2nd ed.: version 6.0., Microsoft Press, 1993
Raymond Chen, Microsoft Blogs, July 28, 2020,“A Look Back at Memory Models in 16-bit MS-DOS,” Raymond Chen, The Old New Thing, Microsoft Blogs,
“On Memory Allocations Larger Than 64KB on 16-Bit Windows” Raymond Chen, The Old New Thing, Microsoft Blogs,
"Rootkits and Bootkits: Reversing Modern Malware and Next Generation Threats,”Alex Matrosov, Eugene Rodionov, and Sergey Bratus, No Starch Press, 2019
"The Giant Black Book of Computer Viruses," Mark Ludwig, 2nd ed., American Eagle Books, 1998.
Tequila virus,published on VX-Underground GitHub
I won’t be covering all the details of how to write an interrupt service routine here.
This is predicated on the fact that writing an ISR requires some background knowledge of OS internals (not an insurmountable quantity, but a non-negligible amount). If you’re interested in diving into this part of OS hacking more deeply, I’ve listed a few basic resources below to get you started.
There is a pretty solid series that was published relatively recently (with respect to publications on MS-DOS virus techniques) on the Interrupt Vector Table, and writing a TSR for MS-DOS. It is a series of articles by Dejan Lukan, published in 2013 on the Infosec Institute’s website.
It’s not the best — notably the author makes a big point of literally using the TSR routine from a book and not writing his own. This gives off some hardcore skiddie vibes. Apart from that faux pas, the content and overview is decent.
“MSDOS and the Interrupt Vector Table (IVT)”
Dejan Lukan
Infosec Institute
March 14, 2013
“Logging Keystrokes with MSDOS: Part 1”
Dejan Lukan
Infosec Institute
March 18, 2013
“Logging Keystrokes with MSDOS: Part 2”
Dejan Lukan
Infosec Institute
March 19, 2013
“Interrupt Service Routines”
OSDev Wiki
One of a few key takeaways from the above article:
An interrupt has to end with an iret opcode (per: “Interrupt Services Routines”, OSDev)
80386 Programmer’s Reference Manual - Chapter 17
80386 Programmer’s Reference Manual
Opcode CLI - Clear Interrupt Flag
80386 Programmer’s Reference Manual
Opcode STI - Set Interrupt Flag
[Not as relevant for ISRs, just a good opcode family to get to know]
80386 Programmer’s Reference Manual
Opcode SCA/SCASB/SCASW/SCASD - Compare String Data