Sunday, August 03, 2008

Is framework-level SQL query caching dangerous?

I was in a bookshop a few months ago and picked up a book about Ruby on Rails, and though I sadly didn't buy it (having already bought more books than I wanted to carry) and I've forgotten it's name, there was an interesting gem in there that stuck in my head.

Ruby on Rails' main method of performing SQL queries (ActiveRecord) since 2.0, by default, caches the results of SELECT queries (though it does string-level matching of queries, so they need to be completely identical, rather than functionaly identical) untill a relevant update query updates the table.

I haven't had a chance to delve into the source to see how granularly the cache is updated (i.e. if a row in the cache was not updated in an update, is the cache still invalidated sinc ethe table was updated?), but in any case, it still seems dangerous.

Caching data that close to the application means that unless you know about the caching and either disable it or ensure that anything else that possibly updates the data also uses the same cache, you may end up with an inconsistent cache.

Assuming that flushing the cache is fairly granular operation (or there is very little activity on the table or users are stored as separate tables, or something similar), it seems feasible that if there is any other technology interacting with the same database, it would be possible to ensure that dodgy data is cached in the Rails app.

A possible example would be a banking site where an SQL query like
SELECT * FROM accounts WHERE acct_id = ?
Is executed when you view the account details or attempt to transfer money, then if there was another interface to withdraw money (lets say an ATM even; doesn't have to be web-based) that bypassed Rails' SQL cache it is theoretically possible that if we would be able to withdraw the money, but the Rails app would still think that we had money left.

At this point of course things would become very implementation specific, though there are two main ways I could see this going (maybe there are more):

1. We could be lucky and the application itself subtracts the amount we're trying to transfer and then puts that value directly into an SQL query, so it's as if we never withdrew any money.

2. We could be unlucky and an SQL query of this variety is used:
UPDATE accounts SET balance = balance - ? WHERE acct_id = ?
In which case the only thing we gain is the ability to overdraw from an account where we would not be able to usually; not particularly useful unless you've stolen someone else's account already and want to get the most about it, but it does still allow us to bypass a security check.

Does anyone have any thoughts on this? Is this too unlikely? Is no-one ever going to use Rails for anything sensitive? Should I go and read the source next time before posting crap? etc
Or knowledge of the Rails caching mechanism? I'll probably take a look soon, but given I've been meaning to do this for months...


Ambush Commander said...

I agree that framework level SQL query caching is dangerous, but only because I was personally bitten by it by another framework (CakePHP). The danger lies in two things, I think:

1. External access to the database, as you mention. This doesn't have to be another application; it can be introspective SQL that doesn't map from one input to one output (random functions, time functions , etc).

2. Reliance on the cache being "perfect". If a query cache is implemented by default, it is probably meant to be completely transparent, and when the cache handles something it shouldn't handle, you get a mysterious bug unless you know about the query cache.

When the data isn't that important, query caches can be a useful way of quickly improving the performance of SQL-inefficient code. The correct way to fix this, however, is transactional support and caching implemented on a database mapper level.

Anonymous said...

Nice thoughts Kuzz55. Definitely research worthy.

Overall, I never liked caching at any level including not not limited to browsers. But you're right a race condition seems plausible right there. It's who owns the most connections will win the cache. Said that, even a small cache TTL will not save them if you own the most connections racing against someone else. Caching queries or requests is bound to fail.