Reverse Engineering

Decoding the ARM7TDMI Instruction Set (Game Boy Advance)

Learn how to decode the entire ARM7TDMI instruction set to develop your own Game Boy Advance emulator.

Written by Gregory Gaines
11 min read
0 views
Gameboy advance teardown
Photo by Patrick Gallagher on Artstation

Table of Contents

My favorite console is the Game Boy Advance. I remember spending hours playing Pokemon in the dead of night then quickly hiding the console under my pillow when my mom barged into my room. In short, the console holds a special place in my heart. After discovering emulation, I knew I had to make a Game Boy Advance emulator someday. Unfortunately, writing an emulator is extremely difficult 😢.

I got stuck on decoding the ARM7TDMI instructions. The CPU is arguably one the most crucial component of an emulator; if it fails, the emulator fails. My first attempts were ugly switch statements within nested if-else statements.

Wanna skip straight to the code?

I endlessly googled for tutorials, but none seemed clear outside of copying decoding charts from other people's code. After some trial and error, I figured out a simple pattern, and today, I'll try to explain how to easily decode the ARM7TDMI instruction set without creating a confusing string bit table.

Prerequisite

  • Some emulator experience
  • Basic knowledge of the ARM7TDMI CPU
  • Optional - ARM7TDMI reference manual: A simple Google search should yield multiple results, but you didn't hear that from me.

Quick Bitwise AND (&) Operator Lesson

Let me quickly explain how to use the bitwise AND (&) operator, since you'll be using it a lot. The bitwise AND operator compares both bits of its input and sets the corresponding bit in the result if both bits are 1. Otherwise, it clears the bit to 0.

101101 210111 3----- 400101
111011 200111 3----- 400011

The operator is also handy for determining if certain bits are set or cleared. We do this by first creating a mask which is an integer with the bits we want checked set to 1. Bitwise AND the mask against some input and you get an integer with the extracted bits.

Suppose we have the format 1101_1010 and want to check if an integer contain the format. Define a mask of 1111_1111 for extracting the bits we care about, then bitwise AND the mask against the input and compare the results to the format. If they are equal, we found a match!

1Format: 1101_1010 2Mask: 1111_1111 3---- 4 5Input: 250 6 7 1111_1010 8& 1111_1111 9----------- 10 1111_1010 11 12Output: Doesn't match 13 14Input: 218 15 16 1101_1010 17& 1111_1111 18----------- 19 1101_1010 20 21Output: Matches

Decoding the Instruction Format

Each ARM7TDMI instruction has a distinct format between 32 bits (ordered left to right from 31 to 0). The upper 4 bits (31-28) aren't important since they hold the conditional field that tells the CPU what flags it needs in-order to execute the instruction.

To identify an instruction, first identify which bits need to be set or cleared. This is the instruction format. Next, define a mask to extract the bits we want to check. Then bitwise AND the mask against the opcode to check if the opcode matches the instruction format.

To clarify, take the Branch and Branch Exchange instruction. The instruction format is 0000_0001_0010_1111_1111_1111_0001_0000 as shown below because it is unique and isn't shared by any other instruction.

Next, define a mask of 0000_1111_1111_1111_1111_1111_1111_0000 to extract the bits we care about. Note we ignore the Cond and Rn fields. Ignore fields like Rn, Rd, Offset, ...etc and zero them out in both the format and mask since they are variable and not required. Finally, bitwise AND the mask against the opcode and compare the results against the instruction format. If they are equal, the opcode is the instruction!

1Branch and Branch Exchange format = [0000_0001_0010_1111_1111_1111_0001_0000] 2Format mask = [0000_1111_1111_1111_1111_1111_1111_0000] 3---- 4 5Opcode = 0xE12F_FF15 or [1110_0001_0010_1111_1111_1111_0001_0101] 6 7 1110_0001_0010_1111_1111_1111_0001_0101 - Opcode 8& 0000_1111_1111_1111_1111_1111_1111_0000 - Format mask 9----------------------------------------- 10 0000_0001_0010_1111_1111_1111_0001_0000 == Branch and Branch Exchange format 11 12Result: Opcode matches instruction format 13 14Opcode = 0xE92F_FFD5 or [1110_1001_0010_1111_1111_1111_1101_0101] 15 16 1110_1001_0010_1111_1111_1111_1101_0101 - Opcode 17& 0000_1111_1111_1111_1111_1111_1111_0000 - Format mask 18----------------------------------------- 19 0000_1001_0010_1111_1111_1111_1101_0000 != Branch and Branch Exchange format 20 21Result: Opcode doesn't match instruction

Here is the above example in code:

Go
1func IsBranchAndBranchExchange(opcode uint32) bool { 2 // Define the unique format to identify the instruction 3 const branchAndExchangeFormat = 0b0000_0001_0010_1111_1111_1111_0001_0000 4 5 // Mask to extract the bits of the above format 6 const formatMask = 0b0000_1111_1111_1111_1111_1111_1111_0000 7 8 // Extrack the format from the opcode 9 var extractedFormat = opcode & formatMaskBits 10 11 // Check if the extracted format matches the instruction format 12 return extractedFormat == branchAndExchangeFormat 13}

ARM Instructions

Take note the decoding order matters.

Go
1func DecodeARMInstruction(opcode uint32) Instruction { 2 // Decode opcode in order 3 switch { 4 // 1. Branch and Branch Exchange 5 case IsBranchAndBranchExchange(opcode): { 6 return branchAndBranchExchange; 7 } 8 9 // 2. Block data transfer 10 case isBlockDataTransfer(opcode): { 11 return blockDataTransfer; 12 } 13 ... 14 } 15}

1. Branch and Branch Exchange

  • The unique format is bits 27-4 0000_0001_0010_1111_1111_1111_0001_0000.
Go
1func IsBranchAndBranchExchange(opcode uint32) bool { 2 const branchAndExchangeFormat = 0b0000_0001_0010_1111_1111_1111_0001_0000 3 4 const formatMask = 0b0000_1111_1111_1111_1111_1111_1111_0000 5 6 var extractedFormat = opcode & formatMask 7 8 return extractedFormat == branchAndExchangeFormat 9}

2. Block Data Transfer

  • The unique format is bits 27 - 25 100.
Go
1func IsBlockDataTransfer(opcode uint32) bool { 2 const blockDataTransferFormat = 0b0000_1000_0000_0000_0000_0000_0000_0000 3 4 const formatMask = 0b0000_1110_0000_0000_0000_0000_0000_0000 5 6 var extractedFormat = opcode & formatMask 7 8 return extractedFormat == blockDataTransferFormat 9}

  • The unique format is bits 27 - 25 101.
  • The Link version of the instruction is the same as above but with bit 24 set.
Go
1func IsBranchAndBranchWithLink(opcode uint32) bool { 2 const branchFormat = 0b0000_1010_0000_0000_0000_0000_0000_0000 3 const branchWithLinkFormat = 0b0000_1011_0000_0000_0000_0000_0000_0000 4 5 const formatMask = 0b0000_1111_0000_0000_0000_0000_0000_0000 6 7 var extractedFormat = opcode & formatMask 8 9 return extractedFormat == branchFormat || extractedFormat == branchWithLinkFormat 10}

4. Software Interrupt

  • The unique format is bits 27 - 24 1111.
Go
1func IsSoftwareInterrupt(opcode uint32) bool { 2 const softwareInterruptFormat = 0b0000_1111_0000_0000_0000_0000_0000_0000 3 4 const formatMask = 0b0000_1111_0000_0000_0000_0000_0000_0000 5 6 var extractedFormat = opcode & formatMask 7 8 return extractedFormat == softwareInterruptFormat 9}

5. Undefined

  • The unique format is bits 27 - 25 011 which is shared by the Single Data Transfer instruction and disambiguated by bit 4 1 being set.
Go
1func IsUndefined(opcode uint32) bool { 2 const undefinedFormat = 0b0000_0110_0000_0000_0000_0000_0001_0000 3 4 const formatMask = 0b0000_1110_0000_0000_0000_0000_0001_0000 5 6 var extractedFormat = opcode & formatMask 7 8 return extractedFormat == undefinedFormat 9}

6. Single Data Transfer

  • The unique format is bits 27 - 26 01.
  • The format is shared with the Undefined instruction which shouldn't be used.
Go
1func IsSingleDataTransfer(opcode uint32) bool { 2 const singleDataTransferFormat = 0b0000_0100_0000_0000_0000_0000_0000_0000 3 4 const formatMask = 0b0000_1100_0000_0000_0000_0000_0000_0000 5 6 var extractedFormat = opcode & formatMask 7 8 return extractedFormat == singleDataTransferFormat 9}

7. Single Data Swap

  • The unique format is bits 27 - 23 00010 and bits 11-4 0000_1001.
  • Matching this instruction early clears ambiguity with remaining instructions with a similar format (Multiply, Halfword Data Transfer).
Go
1func IsSingleDataSwap(opcode uint32) bool { 2 const singleDataSwapFormat = 0b0000_0001_0000_0000_0000_0000_1001_0000 3 4 const formatMask = 0b0000_1111_1000_0000_0000_1111_1111_0000 5 6 var extractedFormat = opcode & formatMask 7 8 return extractedFormat == singleDataSwapFormat 9}

8. Multiply and Multiply Long

  • Multiply's unique format is bits 27 - 23 00000 and bits 7-4 1001.

  • Multiply longs' unique format is bits 27 - 23 00001 and bits 7-4 1001.
Go
1func IsMultiply(opcode uint32) bool { 2 const multiplyFormat = 0b0000_0000_0000_0000_0000_0000_1001_0000 3 const multiplyLongFormat = 0b0000_0000_1000_0000_0000_0000_1001_0000 4 5 const formatMask = 0b0000_1111_1000_0000_0000_0000_1111_0000 6 7 var extractedFormat = opcode & formatMask 8 9 return extractedFormat == multiplyFormat || extractedFormat == multiplyLongFormat 10}

9. Halfword Data Transfer Register / Immediate

  • Halfword data transfer register's unique format is bits 27-25 000, bit 22 0, and bits 11-4 0000_1SH1.
  • SH can be 01, 10, or 11 but never 00 as that would make it a Single Data Swap instruction.

  • Halfword data transfer immediates' unique format is bits 27-25 000, bit 22 1, and bits 7-4 1SH1.
  • SH can be 01, 10, or 11 but never 00 as that would make it a Single Data Swap instruction.
Go
1func IsHalfwordDataTransferRegister(opcode uint32) bool { 2 const halfwordDataTransferRegisterFormat = 0b0000_0000_0000_0000_0000_0000_1001_0000 3 4 const formatMask = 0b0000_1110_0100_0000_0000_1111_1001_0000 5 6 var extractedFormat = opcode & formatMask 7 8 return extractedFormat == halfwordDataTransferRegisterFormat 9} 10 11func IsHalfwordDataTransferImmediate(opcode uint32) bool { 12 const halfwordDataTransferImmediateFormat = 0b0000_0000_0100_0000_0000_0000_1001_0000 13 14 const formatMask = 0b0000_1110_0100_0000_0000_0000_1001_0000 15 16 var extractedFormat = opcode & formatMask 17 18 return extractedFormat == halfwordDataTransferImmediateFormat 19}

10. PSR Transfer

  • MRS's unique format is bits 27-23 00010 and bits 21-16 001111.

  • MSR's (transfer register contents) unique format is bits 27-26 00, 24-23 10, and bits 21-12 101001_1111.

  • MSR's (transfer register contents or immediate value to PSR flag) unique format is bits 27-26 00, 24-23 10, and bits 21-20 10, and bits 15-12 1111.

  • Based on the MSR format in the docs above, you would expect to check bits 19-16, however, according to GBATek PSR Docs, they be dynamic and not set like the docs above assume, so let's ignore them just to be safe.

Go
1func IsPSRTransferMRS(opcode uint32) bool { 2 const mrsFormat = 0b0000_0001_0000_1111_0000_0000_0000_0000 3 4 const formatMask = 0b0000_1111_1011_1111_0000_0000_0000_0000 5 6 var extractedFormat = opcode & formatMask 7 8 return extractedFormat == mrsFormat 9} 10 11func IsPSRTransferMSR(opcode uint32) bool { 12 const msrFormat = 0b0000_0001_0010_0000_1111_0000_0000_0000 13 14 const formatMask = 0b0000_1101_1011_0000_1111_0000_0000_0000 15 16 var extractedFormat = opcode & formatMask 17 18 return extractedFormat == msrFormat 19}

11. Data Processing

  • The unique format is bits 27-26 00.
  • Operand 2 can't be 1001.
  • Saved for last because it overlaps with almost half of the instructions.
Go
1func IsDataProcessing(opcode uint32) bool { 2 const dataProcessingFormat = 0b0000_0000_0000_0000_0000_0000_0000_0000 3 4 const formatMask = 0b0000_1100_0000_0000_0000_0000_0000_0000 5 6 var extractedFormat = opcode & formatMask 7 8 return extractedFormat == dataProcessingFormat 9}

ARM Decoding Pseudocode

Go
1func DecodeARMInstruction(opcode uint32) Instruction { 2 switch { 3 case IsBranchAndBranchExchange(opcode): 4 return BranchAndBranchExchange; 5 6 case IsBlockDataTransfer(opcode): 7 return BlockDataTransfer; 8 9 case IsBranchAndBranchWithLink(opcode): 10 return BranchAndBranchWithLink; 11 12 case IsSoftwareInterrupt(opcode): 13 return SoftwareInterrupt 14 15 case IsUndefined(opcode): 16 return Undefined 17 18 case IsSingleDataTransfer(opcode): 19 return SingleDataTransfer 20 21 case IsSingleDataSwap(opcode): 22 return SingleDataSwap 23 24 case IsMultiply(opcode): 25 return Multiply 26 27 case IsHalfwordDataTransferRegister(opcode): 28 return HalfwordDataTransferRegister 29 30 case IsHalfwordDataTransferImmediate(opcode): 31 return HalfwordDataTransferImmediate 32 33 case IsPSRTransferMRS(opcode): 34 return PSRTransferMRS 35 36 case IsPSRTransferMSR(opcode): 37 return PSRTransferMSR 38 39 case IsDataProcessing(opcode): 40 return DataProcessing 41 } 42 43 return UnimplementedARMInstruction 44}

THUMB Instructions

Take note the decoding order matters.

1. Software Interrupt

  • The unique format is bits 15-8 1101_1111.
Go
1func IsTHUMBSoftwareInterrupt(opcode uint16) bool { 2 const softwareInterruptFormat = 0b1101_1111_0000_0000 3 4 const formatMask = 0b1111_1111_0000_0000 5 6 var extractedFormat = opcode & formatMask 7 8 return extractedFormat == softwareInterruptFormat 9}

2. Unconditional Branch

  • The unique format is bits 15-11 11100.
Go
1func IsUnconditionalBranch(opcode uint16) bool { 2 const unconditionalBranchFormat = 0b1110_0000_0000_0000 3 4 const formatMask = 0b1111_1000_0000_0000 5 6 var extractedFormat = opcode & formatMask 7 8 return extractedFormat == unconditionalBranchFormat 9}

3. Conditional Branch

  • The unique format is bits 15-12 1101.
Go
1func IsConditionalBranch(opcode uint16) bool { 2 const conditionalBranchFormat = 0b1101_0000_0000_0000 3 4 const formatMask = 0b1111_0000_0000_0000 5 6 var extractedFormat = opcode & formatMask 7 8 return extractedFormat == conditionalBranchFormat 9}

4. Multiple Load / Store

  • The unique format is bits 15-12 1100.
Go
1func IsMultipleLoadstore(opcode uint16) bool { 2 const multipleLoadStoreFormat = 0b1100_0000_0000_0000 3 4 const formatMask = 0b1111_0000_0000_0000 5 6 var extractedFormat = opcode & formatMask 7 8 return extractedFormat == multipleLoadStoreFormat 9}

  • The unique format is bits 15-12 1111.
Go
1func IsLongBranchWithLink(opcode uint16) bool { 2 const longBranchWithLinkFormat = 0b1111_0000_0000_0000 3 4 const formatMask = 0b1111_0000_0000_0000 5 6 var extractedFormat = opcode & formatMask 7 8 return extractedFormat == longBranchWithLinkFormat 9}

6. Add Offset to Stack Pointer

  • The unique format is bits 15-8 1011_0000.
Go
1func IsAddOffsetToStackPointer(opcode uint16) bool { 2 const addOffsetToStackPointerFormat = 0b1011_0000_0000_0000 3 4 const formatMask = 0b1111_1111_0000_0000 5 6 var extractedFormat = opcode & formatMask 7 8 return extractedFormat == addOffsetToStackPointerFormat 9}

7. Push / Pop Registers

  • The unique format is bits 15-12 1011 and bits 10-9 10.
Go
1func IsPushPopRegisters(opcode uint16) bool { 2 const pushopRegistersFormat = 0b1011_0100_0000_0000 3 4 const formatMask = 0b1111_0110_0000_0000 5 6 var extractedFormat = opcode & formatMask 7 8 return extractedFormat == pushopRegistersFormat 9}

8. Load / Store Halfword

  • The unique format is bits 15-12 1000.
Go
1func IsLoadStoreHalfword(opcode uint16) bool { 2 const loadStoreHalfwordFormat = 0b1000_0000_0000_0000 3 4 const formatMask = 0b1111_0000_0000_0000 5 6 var extractedFormat = opcode & formatMask 7 8 return extractedFormat == loadStoreHalfwordFormat 9}

9. SP Relative Load / Store

  • The unique format is bits 15-12 1001.
Go
1func IsSPRelativeLoadStore(opcode uint16) bool { 2 const spRelativeLoadStoreFormat = 0b1001_0000_0000_0000 3 4 const formatMask = 0b1111_0000_0000_0000 5 6 var extractedFormat = opcode & formatMask 7 8 return extractedFormat == spRelativeLoadStoreFormat 9}

10. Load Address

  • The unique format is bits 15-12 1010.
Go
1func IsLoadAddress(opcode uint16) bool { 2 const loadAddressFormat = 0b1010_0000_0000_0000 3 4 const formatMask = 0b1111_0000_0000_0000 5 6 var extractedFormat = opcode & formatMask 7 8 return extractedFormat == loadAddressFormat 9}

11. Load / Store with Immediate Offset

  • The unique format is bits 15-13 011.
Go
1func IsLoadStoreWithImmediateOffset(opcode uint16) bool { 2 const loadStoreImmediateOffsetFormat = 0b0110_0000_0000_0000 3 4 const formatMask = 0b1110_0000_0000_0000 5 6 var extractedFormat = opcode & formatMask 7 8 return extractedFormat == loadStoreImmediateOffsetFormat 9}

12. Load / Store with Register Offset

  • The unique format is bits 15-12 0101 and bit 9 0.
Go
1func IsLoadStoreWithRegisterOffset(opcode uint16) bool { 2 const loadStoreRegisterOffsetFormat = 0b0101_0000_0000_0000 3 4 const formatMask = 0b1111_0010_0000_0000 5 6 var extractedFormat = opcode & formatMask 7 8 return extractedFormat == loadStoreRegisterOffsetFormat 9}

13. Load / Store Sign-Extended Byte / Halfword

  • The unique format is bits 15-12 0101 and bit 9 1.
Go
1func IsLoadStoreSignExtendedByteHalfword(opcode uint16) bool { 2 const loadStoreSignExtendedByteHalfwordFormat = 0b0101_0010_0000_0000 3 4 const formatMask = 0b1111_0010_0000_0000 5 6 var extractedFormat = opcode & formatMask 7 8 return extractedFormat == loadStoreSignExtendedByteHalfwordFormat 9}

14. PC Relative Load

  • The unique format is bits 15-11 01001.
Go
1func IsPCRelativeLoad(opcode uint16) bool { 2 const pcRelativeLoadFormat = 0b0100_1000_0000_0000 3 4 const formatMask = 0b1111_1000_0000_0000 5 6 var extractedFormat = opcode & formatMask 7 8 return extractedFormat == pcRelativeLoadFormat 9}

15. Hi Register Operations / Branch Exchange

  • The unique format is bits 15-10 010001.
Go
1func IsHiRegisterOperationsBranchExchange(opcode uint16) bool { 2 const hiRegisterOperationsBranchExchangeFormat = 0b0100_0100_0000_0000 3 4 const formatMask = 0b1111_1100_0000_0000 5 6 var extractedFormat = opcode & formatMask 7 8 return extractedFormat == hiRegisterOperationsBranchExchangeFormat 9}

16. ALU Operations

  • The unique format is bits 15-10 010000.
Go
1func IsALUOperations(opcode uint16) bool { 2 const aluOperationsFormat = 0b0100_0000_0000_0000 3 4 const formatMask = 0b1111_1100_0000_0000 5 6 var extractedFormat = opcode & formatMask 7 8 return extractedFormat == aluOperationsFormat 9}

17. Move / Compare / Add / Subtract Immediate

  • The unique format is bits 15-13 001.
Go
1func IsMoveCompareAddSubtractImmediate(opcode uint16) bool { 2 const moveCompareAddSubtractImmediateFormat = 0b0010_0000_0000_0000 3 4 const formatMask = 0b1110_0000_0000_0000 5 6 var extractedFormat = opcode & formatMask 7 8 return extractedFormat == moveCompareAddSubtractImmediateFormat 9}

18. Add / Subtract

  • The unique format is bits 15-11 00011.
Go
1func IsAddSubtract(opcode uint16) bool { 2 const addSubtractFormat = 0b0001_1000_0000_0000 3 4 const formatMask = 0b1111_1000_0000_0000 5 6 var extractedFormat = opcode & formatMask 7 8 return extractedFormat == addSubtractFormat 9}

19. Move Shifted Register

  • The unique format is bits 15-13 000.
Go
1func IsMoveShiftedRegister(opcode uint16) bool { 2 const moveShiftedRegistersFormat = 0b0000_0000_0000_0000 3 4 const formatMask = 0b1110_0000_0000_0000 5 6 var extractedFormat = opcode & formatMask 7 8 return extractedFormat == moveShiftedRegistersFormat 9}

THUMB Decoding Pseudocode

Go
1func DecodeTHUMBInstruction(opcode uint16) Instruction { 2 switch { 3 case IsTHUMBSoftwareInterrupt(opcode): 4 return THUMBSoftwareInterrupt 5 6 case IsUnconditionalBranch(opcode): 7 return UnconditionalBranch 8 9 case IsConditionalBranch(opcode): 10 return ConditionalBranch 11 12 case IsMultipleLoadstore(opcode): 13 return MultipleLoadstore 14 15 case IsLongBranchWithLink(opcode): 16 return LongBranchWithLink 17 18 case IsAddOffsetToStackPointer(opcode): 19 return AddOffsetToStackPointer 20 21 case IsPushPopRegisters(opcode): 22 return PushPopRegisters 23 24 case IsLoadStoreHalfword(opcode): 25 return LoadStoreHalfword 26 27 case IsSPRelativeLoadStore(opcode): 28 return SPRelatvieLoadStore 29 30 case IsLoadAddress(opcode): 31 return LoadAddress 32 33 case IsLoadStoreWithImmediateOffset(opcode): 34 return LoadStoreWithImmediateOffset 35 36 case IsLoadStoreWithRegisterOffset(opcode): 37 return LoadStoreWithRegisterOffset 38 39 case IsLoadStoreSignExtendedByteHalfword(opcode): 40 return LoadStoreSignExtendedByteHalfword 41 42 case IsPCRelativeLoad(opcode): 43 return PCRelativeLoad 44 45 case IsHiRegisterOperationsBranchExchange(opcode): 46 return HiRegisterOperationsBranchExchange 47 48 case IsALUOperations(opcode): 49 return ALUOperations 50 51 case IsMoveCompareAddSubtractImmediate(opcode): 52 return MoveCompareAddSubtractImmediate 53 54 case IsAddSubtract(opcode): 55 return AddSubtract 56 57 case IsMoveShiftedRegister(opcode): 58 return MoveShiftedRegister 59 } 60 61 return UnimplementedTHUMBInstruction 62}

Conclusion

This is how to decode the entire ARM7TDMI instruction set. I hope you develop some amazing emulators. If you have one, send me a link using my contacts on my about page. Maybe I'll stop slacking and finish my GBA emulator someday...

Consider signing up for my newsletter or supporting me if you enjoyed the article.

Thanks for reading!

About the author.

I'm Gregory Gaines, a software engineer that loves blogging, studying computer science, and reverse engineering.

I'm currently employed at Google; all opinions are my own.

Ko-fi donationsBuy Me a CoffeeBecome a Patron
Gregory Gaines

You may also like.

Comments.

Get updates straight to your mailbox!

Get the latest blog updates about programming and the industry ins and outs for free!

You have my spam free, guarantee. 🥳

Subscribe