                     "How does it work" file for U7WIN9X
                               August 2000
                             Gilbert Rouqui
                        <e-mail address to be filled> 
                          <dragon name to be filled>

                               Version 1.10


For the curious and technically minded : How U7WIN9X Works
----------------------------------------------------------

-- What Flat Real Mode is --
----------------------------
In this discussion 386 stands for the Intel 386 Architecture, whose CPU
representants are 80386s, 80486s, Pentium, including MMX, Pro, II, Xeon and
III and compatible CPU such as AMD K5, K6, Athlon ...

VOODOO, as named by Origin Inc, and as present in Ultima 7 Part I and Part II
is the use of 32 bit wide memory addresses to read and write from and to any 
memory location of the 386 Addressing Space (4 GB). This, from a DOS program
which uses inherently 16 bit wide addresses.

32 bit wide addresses can always be used from a 16 bit native program, a DOS
program such as Ultima 7, because the 386 allows an instruction to contain
the override byte 0x67 (ADR32) whose aim is to notify the processor that the 
current instruction is to be executed as a 32 bit mode instruction. (Strictly
speaking, this is a toggle. When used while the machine is in 32 bit mode,
it then instructs the processor to handle the instruction as 16 bit mode.)

BUT even trying to access 32 bit address within the current DS segment must
pass the DS segment Limit check, which in DOS real mode and Virtual 86 mode
is 64k. So simply said, you may be free to use 32 bit addresses but this will
not help you in any way because any address above 64 k = 16 bits will trap
in General Protection Fault. 

This is where Flat Real Mode, also named Unreal Mode, comes in. It looks like
a bug in the 386, but so many programs know about it and use (I would say
abuse) it, that Intel cannot fix it. Here it is. 

The DS or ES segment limit of 64k is stored into the segment descriptor cache
at power-on time, when the CPU starts in Real Mode. But if a program switches
from Real Mode to Protected Mode, creates a segment descriptor with a limit
larger than 64k, for example 4G, sets that descriptor into, say, DS and ES,
and quickly reverts to Real Mode, then the segment limit in the descriptor
cache, set by the setting of DS and ES in protected mode, is NOT REVERTED
back to 64k when the processor returns to Real Mode. Since then the segment
limit in the descriptor cache is never to be modified again by any instruction
in Real Mode, it remains set to the value it received from the descriptor in
Protected Mode. Thus, with the example on ES and DS to 4G limit, the CPU allows
free access with DS and ES to any location in the range of 4G.

Intel could not plug this hole in Real Mode. But Virtual 86 mode has no such
hole. The 64k limit of Virtual 86 addresses in strictly enforced. Since 
Virtual 86 mode is what your DOS programs are using whenever they run under 
Windows, Flat Real Mode is out, and so is VOODOO and so is Ultima 7.

-- How is U7WIN9X overcoming this --
------------------------------------
So how does this work in U7WIN9X ? Basically U7VXD.VXD, a protected mode 32
bit mode piece of code, sits normally waiting. When Ultima 7 attempts a Flat
Real instruction, the game is interrupted, and the VXD gains control. It
obtains all the registers from the game at the point of interrupt, builds
a conformant image of the forbidden Flat Real instruction and executes
it on behalf of the game.

Since the VXD is 32 bit protected mode, it has proper access to the whole
memory range of the game. So the memory access attempted and failed as Flat 
Real Mode will succeed when rerun by the VXD. When done, the VXD stores 
the registers and the condition flags back and resumes the game. 

VOODOO in Ultima 7 is simple to handle because when used, all Flat Real Mode 
instructions have DS and ES set to 0000. This fits nicely with the DS and
ES within a VXD, because they are Base=0, Limit=4G.
On the other hand, performance is crucial because VOODOO is used not only
to access XMS memory allocated above 1 M - it is called VOODOO memory by 
Ultima 7, but also far memory between the end of the EXE and 1M, and this
includes the 64 k of the Video VGA buffer at 0x000A0000. 
To give a feeling, while the game starts, it displays a shifting mosaic
pattern, red in Part I, blue in Part II, for a few seconds. During
this time, VOODOO executes about 5 million Flat Real Mode instructions !!!

This is however a bit too simplified. This is how I hoped initially to make it
work. In reality, the VXD does not catch the General Protection Fault from 
the game. Instead it catches BreakPoint Faults from BreakPoint Interrupt
instructions set to replace all Flat Real Mode instructions in the game.

So the VXD has the additional task, when an EXE of the game has been loaded
by U7RUN, to patch it in memory at all locations where Flat Real Mode 
instructions are. It stores in their place a byte 0xCC (Breakpoint interrupt)
followed by a coding byte which describes the kind of Real Mode instruction
at that location. 

Then the VXD hooks on the Breakpoint Fault on behalf of the game. When the
game runs over the byte 0xCC, it is interrupted and the VXD takes contro.
It then uses the coding byte and if needed the remainder of the instruction
bytes to regenerate the Flat Real Mode instruction and execute it. 

The patching strategy works because every Flat Real mode instruction is
2 bytes or more, so that the VXD has the space to store the 0xCC and
the coding byte. Every Flat Real Mode instruction is 2 bytes or more
because it needs at least one byte at least for the codop and the arguments
AND one byte for the ADR32=0x67 override.

The reason I did not hook directly the General Protection Fault that would 
occur if the game was left running the original Flat Real instructions was
that then the VXD would have a harder time to determine whether the GPF
was a genuine VOODOO event from the game or not, and to decode and regenerate
the Flat Real Mode instruction.

-- In Action --
---------------
Well this is the overall principle of the heart of U7VXD.VXD. Now for
the specifics :
    U7RUN.COM determines that this is Ultima 7 Part II when it finds SI.EXE
              in its own directory. It assumes Ultima 7 Part I otherwise.
    U7RUN.COM loads U7VXD.VXD : U7RUN uses VXDLDR to dynamically load U7VXD.
              Just before returning to DOS, U7RUN similarly uses VXDLDR to 
              unload U7VXD. VXDLDR.VXD is a staticly loaded thus always
              available, Windows provided VXD with DeviceID=0027 whose aim is
              to service requests to dynamically load or unload VXDs.
    U7RUN.COM schedules MAINMENU.EXE, INTRO.EXE, SI.EXE/U7.EXE and ENDGAME.EXE
              according to the return code each one supplies to detemine which 
              one should follow. This is how ULTIMA7.COM/SERPENT.COM work.
    U7RUN.COM hooks software interrupt 0x21 (DOS calls). When a DOS call comes
              in and this is an Open (0x3D) and the file to be opened is
              EMMXXXX0, then it rejects it with code 2 and carry set. This
              is Ultima 7 inquiring about EMS. Should Ultima 7 find EMS
              active, it would then stop with a message complaining about
              protected mode, please remove the offending program.
              All other DOS commands are passed through.
    U7RUN.COM hooks software interrupt 0x2f (Win calls). This time this is to
              catch XMS calls. When an XMS Allocate command (0x09) comes in,
              U7RUN reflects it to U7VXD which uses _PageAllocate to acquire
              memory within the 32 bit wide address space of the DOS Virtual
              Machine, on behalf of the game. It _PageFree(s) it when the
              corresponding XMS Deallocate command (0x0A) comes in. 
              Ultima 7 also uses a XMS Lock request (0x0C) to fix the XMS
              memory block in memory and obtain its 32 bit address.
              When it receives it, U7RUN issues to U7VXD the request to patch
              the current program along with the identity of the EXE to be
              patched. When completed, U7RUN returns to the game the location
              of the _PageAllocate(d) memory.
              U7RUN handles XMS Unlock (0x0D) as a no-op. 
              U7RUN always returns 1 MByte XMS memory available to XMS Query
              Available Memory (0x08). This value seems to satisfy Ultima 7.
              Less memory would make it slower, more memory would induce it 
              to create and manage privately a Virtual Disk. This is useless
              since Windows handles its own Disk Cache, better than Ultima 7
              can do. 
              All other Win calls are passed through.

    U7VXD.VXD when loaded, registers the identity of the Virtual Machine that
              started it. It will then react only to requests coming from the
              same virtual machine. It also hooks the Breakpoint interrupt
              from the Virtual 86 mode. When unloaded, it unhooks the Virtual
              86 Breakpoint interrupt.
    U7VXD.VXD handles Memory allocate and free requests as the first two cases
              of Breakpoint interrupt 0xCC, followed by coding byte 0x00.
    U7VXD.VXD handles EXE patch requests as the last case of Breakpoint 
              interrupt 0xCC, followed by code 0x00. U7VXD has a table with
              the location of all Flat Real instructions for the seven EXEs
              in Ultima 7 that use VOODOO : 
                    MAINMENU.EXE    (Part I and Part II)
                    ENDGAME.EXE     (Part I and Part II)
                    INTRO.EXE       (Part II only, Part I does not use VOODOO)
                    U7.EXE (Part I) and SI.EXE (Part II)
              It patches the EXE in memory and returns. Each patched 
              instruction is 0xCC, followed by a code byte that cannot be 
              0x00, followed by the original remainder of the Flat Real Mode
              instruction.

    U7VXD.VXD receives control on a BreakPoint interrupt. This is the true
              heart of U7WIN9X. 
              Based on the coding byte and the remainder of the instruction
              it rebuilds the original instruction locally. Well, not exactly.
              Since the VXD is 32 bit mode whereas the game is 16 bit mode,
              the 0x67 ADR32 byte is useless so it is not copied or generated
              and similarly the 0x66 USE32 byte is toggled, that is, it is
              removed if it was present, added if it was not present. 
              The Virtual Machine data registers EAX, EBX, ECX, EDX, ESI, EDI
              and the user part of the condition flags are loaded. Fortunately
              VOODOO does not use either (E)SP or (E)BP in any of its Flat 
              Real instructions. Since also VOODOO sets the ES and DS segment
              registers to 0000 on all Flat Real instructions, U7VXD leaves
              ES and DS to their VXD default.
              The rebuilt instruction is executed, then the registers and
              the flags (for the comparison instructions) are stored back. 
              U7VXD has a fast path for the family of string instructions, 
              because VOODOO uses them very frequently (LODS, MOVS, STOS 
              and alike).
