SQLite is an embedded SQL database engine purportedly in use by a formidable list of high-profile users and deployed in countless applications. In this first article exploring SQLite's internal workings, we look at SQLite's Virtual Database Engine (VDBE), a virtual machine which executes code generated from SQL commands to store and retrieve data.
First, a word on SQLite's architecture. The figure below shows three major components: Core, SQL Compiler, and Backend. When an SQL command arrives through the interface, it is passed to the SQL Compiler, which turns SQL statements into VDBE opcode programs.
Once this VDBE opcode program has been generated, it is passed to the Virtual Machine which is responsible for evaluating the VDBE opcode program and interacting with the backend. The backend contains our actual data and interacts directly with the host operating system.
Interested in quality programming articles? Sign up to our weekly email:
Your email (along with over 4000 other subscribers) will be kept safe.
The VDBE is described in the SQLite architecture documentation as:
[A] virtual computer that runs a program in its virtual machine language. The goal of each program is to interrogate or change the database. Toward this end, the machine language that the VDBE implements is specifically designed to search, read, and modify databases. SQLite VDBE Documentation
The VDBE is thus a virtual machine running programs written in a specific machine language. This machine language is made up of specialized instructions for storing and retrieving data.
Note that the machine language we're referring to here is not composed of SQL statements. Instead, SQL statements are used to generate opcodes (the code generation box in our architecture diagram), and these opcodes are what the VDBE executes. For a list of opcodes, see the full documentation.
Now, on to the implementation. To show the attributes of the VDBE structure in detail, the code listing below is an extract from vdbeInt.h where the Vdbe struct is defined.
As we can see, a VDBE instance holds pointers to the program it is running (*aOp), memory locations (and the number allocated), an array of program results, as well as a pointer to the program counter and current instruction return value. This fits well the notion that the VDBE is a virtual machine instance — it's the classic Harvard architecture we expect.
So, what does a generated VDBE program look like? Here's our simple example schema, a table called names with Int and String columns.
Now let's look at the VDBE opcodes generated to execute this statement.
As shown, after the Trace instruction (which is used for tracing program execution for debugging/profiling purposes), the first instruction is to jump to line 10 of our program. Here, a transaction is opened, and VerifyCookie is executed. VerifyCookie checks whether the database schema version is the same as argument P2. If not, the schema has been changed by another thread and must be re-read. After this, a TableLock is obtained for the appropriate table. We then jump to instruction 2 of our program, where a write cursor is opened with OpenWrite. A new record is constructed from our inputted data, and the record is inserted with the Insert instruction. After closing the write cursor, Halt is executed which ends the transaction and exits our program.
Having seen a simple insert statement in action, let's look at a more complex VDBE program — adding an index.
As we can see, this is a far longer program than the one generated by an INSERT statement. The sqlite_master table holds a record for every named index, and updating this table is dealt with in lines 2-12. It's similar to the INSERT program above, so we won't cover it in detail.
Instructions 13 and 14 (OpenWrite and SorterOpen) open a writeable index for the table. Instruction 15 opens the table to be indexed for reading, and instructions 16-21 inclusive loop over every row in the table, adding each index record to the sorter. The sorter allows us to sort records with Merge Sort.
In the following instructions, the SorterSort results in the data in the sorter structure being sorted. The index is then created from this sorted data with SorterData and IdxInsert. Various cursors are then closed, and our program exits.
For more detail, I'd really recommend digging through the implementation in vdbesort.c, which should shed more light on the associated sorting and indexing structures.
In summary, we've had a detailed look at how SQLite's virtual database engine executes programs generated from SQL statements. For a look at even more opcode programs, take a look at the (outdated, but still useful) VDBE tutorial. Don't forget to check the up to date opcode reference afterwards, though.
There are two more articles to be published in this series: next, we'll take a closer look at how SQL statements are tokenized, parsed, and assembled into the opcode instructions we saw above. After that we'll explore SQLite's btree implementation and how data is persisted to disk.
Lastly, I should point out that I'm certainly no expert when it comes to SQLite's implementation, so if you notice any inaccuracies, it would be great to hear them. Contact details in the footer.
Thanks to Chris Porter for reviewing a draft of this article.
Read this far? Sign up for the weekly programming email!
Your email (along with over 4000 other subscribers) will be kept safe.
Created by @mattrcottingham
This site uses cookies. If you continue to use the site, we'll assume this is ok. For details, see our cookie policy.