Search This Blog

19 April 2011

Traversing larger graphs with OQGRAPH 3. Time for Bacon?

Using the IMDB data, I have loaded the actor and actress database into a few simple InnoDB tables and have performed the following test:

MariaDB [imdb]> show status like "Innodb_rows_read";
+------------------+---------+
| Variable_name    | Value   |
+------------------+---------+
| Innodb_rows_read | 1237459 |
+------------------+---------+
1 row in set (0.00 sec)

MariaDB [imdb]> select x.* from entity as orig, entity as dest, imdb_graph, entity as x where dest.type='ACTOR' and dest.name='Kevin Bacon' and imdb_graph.latch=2 and imdb_graph.origid=orig.id and imdb_graph.destid=dest.id and x.id=imdb_graph.linkid and orig.type='ACTOR' and orig.name='James Earl Jones';
+---------+---------+------------------------------------------------------------------+
| id      | type    | name                                                             |
+---------+---------+------------------------------------------------------------------+
|  780890 | ACTOR   | James Earl Jones                                                 |
| 1922997 | ARCHIVE | "20 to 1" (2005) {Our All Time Favourite Films (#9.11)}  (voice) |
|  131965 | ACTOR   | Antonio Banderas                                                 |
| 1945521 | ARCHIVE | Oscar, que empiece el espect?culo (2008)                         |
| 1003551 | ACTOR   | Kevin Bacon                                                      |
+---------+---------+------------------------------------------------------------------+
5 rows in set (2.80 sec)

MariaDB [imdb]> show status like "Innodb_rows_read";
+------------------+---------+
| Variable_name    | Value   |
+------------------+---------+
| Innodb_rows_read | 2102482 |
+------------------+---------+
1 row in set (0.00 sec)

MariaDB [imdb]> select (2102482 - 1237459)/2.8;
+-------------------------+
| (2102482 - 1237459)/2.8 |
+-------------------------+
|             308936.7857 |
+-------------------------+
1 row in set (0.09 sec)


Over 300,000 edges examined per second... Nifty.

The entity table has approximately 3 million entries and the link table has approximately 30 million edges.

Note - this performance is with a warmed up buffer pool. With a cold buffer pool, the query takes nearly 30 seconds.

2 comments:

erkules said...

I don't get it. OQgraph used to be memory only? So what is this InnoDB-thing?

x said...

The "Innodb thing" is that it supports persistant graphs. The edge traversal data is still held in memory but in a more efficient way to support large graphs. In order to support persistant graphs, the edge data is stored in a donor table, which may be Innodb or any other storage engine which supports ordered index.