Types of Transactional Operations
This is a series of blog posts to build a transaction manager in C. In this installment, we’re going to take a closer look at the different types of functions that exist and what we have to do to handle them in a transaction.
We’ve already implemented support for several interfaces from the C Standard Library in our transaction manager. We’re now going to sort and formalize this a bit.
If you missed earlier entries in the series, you might want to go back and read the installments so far.
Overview
Initially we only had support for memory operations in our transaction
manager. On top of that, we added memory and string helper functions, such
as memcpy()
. With the introduction of the transaction log, we have also
added support for malloc()
and free()
, which required either reversal
during roll back or delay to commit.
We can distiguish 4 different classes of transactional functions. A transactional function is either
- pure,
- deferable,
- revocable, or
- irrevocable.
Sorted roughly by their impact on the transaction system, these classes differ in the kind of support they require from the transaction manager and consequently the amount of work they require for an implementation.
Pure Functions
Pure functions don’t have any dependencies or side effects that interfere with the transaction.
One example on Linux systems is pthread_self()
,
which returns the calling thread’s id. The function doesn’t depend on any
parameters or mutable state. The returned id is constant for each thread.
In a transaction, we can call pthread_self()
without further considerations.
Another example is fmin()
, a function that returns the
smaller of two values of type double
. The parameters it receives are the
two values to compare. Again, no pointers or mutable state that this function
depends on. A call to fmin
doesn’t even generate errors. There’s a result
for all possible input values, even NaN.
A load()
operation on transactional memory is also pure. While its result
depends on the memory’s content, load()
doesn’t change non-transactional
mutable state.
Handling pure functions is easy. We can call them directly from within the transaction.
Deferable Functions
The next class after pure functions are deferable functions. A deferable function does not have an immediate effect on the transaction and can be defered until commit time.
A common deferable function that we’ve already seen is free()
.
Free releases a block of dynamically allocated memory. It accepts a pointer
to the memory block as its only argument. After free()
returns, the
content of the memory block is invalid and accessing it is an undefined
operation.
If we’d free the memory during the transaction’s execution, there’d be no way of rolling back the transaction later, as free operations cannot be reverted.
However, there’s no need to actually free the memory at this point. Instead
we defer the free operation until commit time. In the case of a roll back
of the transaction, we simply ignore the call to free()
.
Another example of a deferable function is the store()
operation we
implemented for transactional memory. Stores in write-back mode buffer
the written data and defer it until commit time.
The main criteria for a deferable function is that it has an effect on non-transactional mutable state, but the effect is either not immeditely visible, or can be simulated within the transaction.
Revocable Functions
In some way, a revocable function is the opposite of a deferable function. A revocable function is executed during the transaction, but it’s effects are revoked (or reverted) during the transaction’s roll back.
In our case, we’ve already seen malloc()
as an example of
a revokable function. A call to malloc()
returns a dynamically allocated
block of memory. This operation has to be executed immediately during the
transaction as the memory might be required during subsequent operations
within the transaction.
To support malloc()
, the transaction executes the operation immediately
and returns the allocated memory block. If it rolls back the transaction
later, the transaction manager frees the allocated block automatically.
Another example is a read()
operation on a file. Whatever
gets read might have an effect on the transaction, so the data returned by
the read operation is immediately required. But reading has no side-effects
on the data itself. It mostly changes the file offset at which the next read
operation takes place. The change to the offset can be revoked by restoring
the old offset during a roll back. So reading from a file is a revocable
operation.
As a third example, again store()
can act like a revocable function. When
executed in write-through mode, store()
immediately writes the data to
memory. It keeps a copy of the old data, which is restored during a roll
back.
Revocable functions have an effect on the transaction’s execution and therefore have to be performed immediately. The effects on non-transactional mutable state can be revoked during a roll back.
Irrevocable Functions
Finally we have irrevocable functions. The result of an irrevocable function is immeditely required, but there’s no way of revoking it’s effects on non-transactional mutable state.
An example of an irrevocable function is a read()
operation on a Unix pipe,
which is often used to transfer data from one program to another program.
Similar to a file, invoking read()
on a pipe returns the next block of
data. Unlike a file, a pipe doesn’t provide any means of reverting the effect
of the read operation. Once it has read data from the pipe, the transaction
has no way of putting back the data into a pipe during a roll back. The
effect of calling read()
on a pipe is therefore not revocable. The function
is thus irrevocable.
The only way to support irrevocable functions is to guarantee that a transaction is not rolled back after it executed an irrevocable function. This has a number of consequences for the transactional system as a whole.
-
A transaction has to explicitly request irrevocability from the transaction manager. This requires support by the transaction manager.
-
Once it a transaction is irrevocable, it wins any conflict with concurrent transactions.
-
At any time, there can only be one irrevocable transaction. Imagine there were two irrevocable transactions at the same time. If there’s a conflict between these two transactions, it would not be possible to pick a loser without violating irrevocability.
The easiest way of supporting irrevocability is to run the irrevocable transaction exclusively. When a transaction requests irrevocability, all other transactions are stopped, rolled back, and suspended until the irrevocable transaction committed. So there are either multiple revocable transactions running, or at most one irrevocable transaction. This is the same semantics as for reader/writer lock and that’s how it’s implemented.
A more elaborated implementation would use a lock manager. This component of the transaction manager observes the state of resource locks, and resolves conflicts between transactions. When it has to resolve a conflict between the irrevocable transaction and a revocable transaction, it would always pick the irrevocable transaction as winner.
To summarize, irrevocable functions have an effect on the transaction’s execution and therefore have to be performed immeditely. The effects on non-transactional mutable state cannot be revoked during a roll back, so the transaction may not perform a roll back at all.
Special Cases
Aside from the 4 basic types, there are a number of special cases.
First of all, there are functions that are both deferable and recovable. One
such function is realloc()
, which changes the size of an
already allocated block of memory. A call to realloc()
has to be performed
immediately as the allocated memory has to be returned into the transaction.
This memory has to be released during a roll back. The memory block of the
old size also has to be released, but only at commit time. An implementing
of a transactional realloc()
is a combination of malloc()
and free()
.
Another special case happens when functions change their type depending
on some internal state. We’ve already seen store()
and read()
. These
functions’ type depends on parameters and internal state.
Finally, there are operations that simple don’t make sense in the context
of a transaction. These usually concern program state as a whole. Calling
functions like exit()
, exec()
, or
abort()
will end the current program in some way. Calling
them within a transaction would probably lead to an inconsistent state of
the system.
Summary
In this blog post, we’ve discussed the different types of transactional functions.
- Pure functions have no effect on non-transactional state.
- Deferable functions are defered until commit time. Their effects on non-transactional state are simulated to the transaction.
- Revocable functions are executed immediately during the transaction, but their effects on non-tranactional state are reverted during a roll back.
- Irrevocable functions are executed immediately, but their effects on non-transactional state cannot be reverted. A transaction may not have to roll back after executing an irrevocable function.
- Functions can change their type, depending on the parameters or the transaction’s state.
- Some functions have no meaningful semantics in the context of transactions.
If you like this series about writing a transaction manager in C, please subscribe to the RSS feed, follow on Twitter or share on social networks.
Post by: Thomas Zimmermann