Like many applications, Google Chrome uses an SQLite database (or rather a number of SQLite databases) to store information relating to pages visited. One of these databases is the history DB which uses a set of normalised tables which, amongst other things, holds a table showing the date and time of every page visited, where appropriate the previous page (the page on which the user clicked a link/referring page) and an internal ID number of an entry in the URL table that contains the text of the page itself.
The purpose of this article is to show how we can generate a list of the pages that a user has followed while browsing the internet, or more correctly showing the chain of webpages that led to a specific page. To achieve this we will use a feature of SQL known as Recursive Common Table Expressions (RCTE), don’t worry it sounds worse than it is!
But first let’s look at a simple RCTE (from the SQLite web site) and see how it all works:
Code:
WITH RECURSIVE cte(x) AS (
SELECT 1
UNION ALL
SELECT cte.x + 1
FROM cte
LIMIT 10)
SELECT *
FROM cte
The essential feature of any RCTE is that there is a UNION between two tables, the first part of the UNION (SELECT 1) sets up the starting conditions of the query and the second half of the UNION ( SELECT cte.x + 1 FROM cte) refers back to the first half (this is the recursive element).
If that’s not clear, then hopefully things will become clearer as we work through this explanation.
But first, a bit more background on the tables we are interested in.
The first table we want to look at from the Chrome history is the visits table, this table contains a number of columns, we are just interested in the first four (the purpose of the remainder is outside of the scope of this article).
These columns are:
- Id – a unique identification number of this record (this is the tables primary key)
- URL – the id of the entry in the URLs table that contains the text of the URL visited
- visit_time – the time of the visit encoded as a Chrome date (we will decode this later)
- from_visit – the id in this table of the previous page.
The keen-eyed amongst you will have noted that number in the last entry in the from_visit column is sometimes the same as the id of a previous row, this is our link of pages visited and we can follow this list back until the from_visit entry is 0 which indicates the start of a viewing chain.
- The row with id = 27585 has a from_visit = 27584
- The row with id = 27584 has a from_visit = 27583
- The row with id = 27583 has a from_visit = 27582
- The row with id = 27582 has a from_visit = 0
Clearly, from an investigation standpoint, it would be useful to have a report of each of the pages in a chain displayed in order, showing the route through the internet history to a given page.
The second table is the URLs table. The URLs table contains a number of columns that would interest a forensic investigator, for this article we will concern ourselves with just the first two columns.
- id – again the primary key and a unique identifier for this row in this table
- url – the URL of the page visited.
The query that we use to create the results table above is:
Code:
SELECT visits.id,
visits.url,
visits.visit_time,
visits.from_visit,
urls.url AS URL1
FROM visits
LEFT JOIN urls ON visits.url = urls.id
For the sake of this article, we will follow the chain of web sites back from entry ID 27585 and as mentioned above we will use a Recursive Common Table Expression (RCTE). But before we do that we will digress a little and look a little at what a Common Table Expression (CTE) actually is.
A CTE is analogous, in simple terms, to a temporary table (or to a view), but unlike a temporary table or view, it exists just for the life of the query. As discussed above the query to select the rows we are interested in, is:
Code:
SELECT visits.id,
visits.url,
visits.visit_time,
visits.from_visit,
urls.url AS URL1
FROM visits
LEFT JOIN urls ON visits.url = urls.id
Expressed as a Common Table Expression this would be:
Code:
WITH cte AS (SELECT visits.id,
visits.url,
visits.visit_time,
visits.from_visit,
urls.url AS URL1
FROM visits
LEFT JOIN urls ON visits.url = urls.id
)
SELECT *
FROM cte
The first five lines simply set up the Common Table and the final two lines are the query, basically put, cte becomes an alias for the select expression. Another way of thinking of the above expression is the Common Table Expression “cte” is an alias for the SELECT query and this alias can then be referred to in place of the query.
Recursion
As beautifully shown in the tongue in cheek dictionary definition of Recursion “Recursion – def: see Recursion”
a Recursive Common Table Expression is an expression that references itself. The basic format of a Recursive CTE (RCTE) is
Code:
WITH RECURSIVE (alias_name) AS (
Initial-select
UNION ALL
Recursive-select)
SELECT * from alias_name
Or from the SQLite web site:
We can now work with our RCTE skeleton above and replace a section at a time to build up our final query.
Start by giving the RCTE an alias-name, we’ll call this (Imaginatively) rcte:
Code:
WITH RECURSIVE rcte AS (
Initial-select
UNION ALL
Recursive-select)
SELECT * from rcte
Now we need to replace the initial-select portion of the query (the first half of the UNION), as discussed before, the basic query we will be using to create the table is reproduced here:
Code:
visits.url,
visits.visit_time,
visits.from_visit,
urls.url AS URL1
FROM visits
LEFT JOIN urls ON visits.url = urls.id
We need however to take this one step further and decide which record we want to start with, as mentioned above we want ID = 27585. So the initial-select becomes:
Code:
SELECT visits.id,
visits.url,
visits.visit_time,
visits.from_visit,
urls.url AS URL1
FROM visits
LEFT JOIN urls ON visits.url = urls.id
WHERE visits.id = 27585
We can copy and paste this directly into our ‘in-progress’ query which now becomes as follows:
Code:
WITH RECURSIVE rcte AS (
SELECT visits.id,
visits.url,
visits.visit_time,
visits.from_visit,
urls.url AS URL1
FROM visits
LEFT JOIN urls ON visits.url = urls.id
WHERE visits.id = 27585
UNION ALL
Recursive-select)
SELECT * from rcte
Now the second, recursive part of the UNION; again we use the starting query but with a slight modification, the query needs to be recursive, i.e. it needs to refer to itself (I have highlighted the changed element of the query):
SELECT visits.id,
visits.url,
visits.visit_time,
visits.from_visit,
urls.url AS URL1
FROM rcte
LEFT JOIN urls ON visits.url = urls.id
This is not quite enough though, we also need to tell the query how to follow our chain of pointers, we can do this using an INNER JOIN (see my previous article on JOINS for more information) and we JOIN the two tables on the from_visit and the id fields.
Code:
SELECT visits.id,
visits.url,
visits.visit_time,
visits.from_visit,
urls.url AS URL1
FROM rcte
LEFT JOIN urls ON visits.url = urls.id
INNER JOIN visits ON CTE.from_visit = visits.id
Now we can copy and paste this subquery to our in-progress query:
Code:
WITH RECURSIVE rcte AS (
SELECT visits.id,
visits.url,
visits.visit_time,
visits.from_visit,
urls.url AS URL1
FROM visits
LEFT JOIN urls ON visits.url = urls.id
WHERE visits.id = 27585
UNION ALL
SELECT visits.id,
visits.url,
visits.visit_time,
visits.from_visit,
urls.url AS URL1
FROM rcte
LEFT JOIN urls ON visits.url = urls.id
INNER JOIN visits ON rcte.from_visit = visits.id
)
SELECT * from rcte
Although this query works (you can open an SQLite command window and try it – but make sure you use a valid visits.id) I want to make two very small changes.
The first is easy and simply orders the results by ID. The second limits the number of iterations through the recursive loop, while not essential if you don’t do this and create a query incorrectly, or the table you are querying loops back on itself then a recursive query can just hang. The final query is below with the two changes highlighted:
Code:
WITH RECURSIVE rcte AS (
SELECT visits.id,
visits.url,
visits.visit_time,
visits.from_visit,
urls.url AS URL1
FROM visits
LEFT JOIN urls ON visits.url = urls.id
WHERE visits.id = 27585
UNION ALL
SELECT visits.id,
visits.url,
visits.visit_time,
visits.from_visit,
urls.url AS URL1
FROM rcte
LEFT JOIN urls ON visits.url = urls.id
INNER JOIN visits ON rcte.from_visit = visits.id
LIMIT 100
)
SELECT * from rcte
ORDER BY rcte.id
By way of explanation, the recursive-select part of the UNION references back to the initial-select part and the query will continue through each row until rcte.from_visit does not equal visits.id. INNER JOIN will only match rows from the visits table that are also in the cte table – see my previous article on joins for more information.
In English, it works like this:
- The initializing select runs and selects the one row from the visits table that has ID 27585. This row is inserted into our results table
- The recursing select uses this value and selects the rows (in our case a single row) which has a from_visit equal to the ID in step 1
- The row from step 2 becomes our new step 1 and the process repeats until step 2 fails to return a row, or we have been through this loop 100 times
In The Forensic Browser for SQLite this looks as follows:
We can, of course, display the Chrome date in a more meaningful format using the built-in options in the Forensic Browser for SQLite and the report produced can be exported to PDF, HTML, XLSX or csv.