Craig Barratt
2013-06-16 20:27:46 UTC
Over the last 2 months I've made some significant progress on 4.0
development.
Unfortunately (for BackupPC development at least), I'm starting a new job
in a week, but I'll try to get an alpha release out before then. Hopefully
I'll still find the time to finish testing and bug fixing to produce an
official release at some point
I'm quite excited about the new architecture, features and performance.
While the pool and storage layouts have completely changed, it remains
backward compatible to existing installations (although more testing is
needed there).
I'll be traveling for a few days this week, but hopefully I'll get some
time to explain some of the new features in more detail. I've attached a
very short summary below.
There are a couple of areas where it would be helpful to get people's
suggestions too.
Craig
- No use of hardlinks (except temporarily to do atomic renames).
Reference counting
is handled at the application level.
- Backups are stored as "reverse deltas" - the most recent backup is
always filled
and older backups are reconstitued by merging all the deltas starting
with the
nearest future filled backup and working backwards.
This is the opposite of V3 where incrementals are stored as "forward
deltas"
to a prior backup (typically the last full backup or prior lower-level
incremental backup, or the last full in the case of rsync).
- Since the most recent backup is filled, viewing/restoring that backup
(which is
the most common backup used) doesn't require mergning any deltas from
other backups.
- The concepts of incr/full backups and unfilled/filled storage are
decoupled.
- Uses full-file MD5 digests, which are stored in the directory attrib
files.
Each backup directory only contains an attrib file.
- The Pool layout still support chains to handle md5 collisions. While
collisions
can be constructed and are now well-known, they are highly unlikely in
the wild.
Pool files are never renamed or moved, unlike V3.
- Any backup can be deleted (deltas are merged into next older backup if
it is
not filled).
- The reverse deltas allow "infinte incrementals" - no need for a full
backup
if you are willing to trade speed for the risk that a file change will
not be detected if the mtime or size don't change.
- An rsync "full" backup now uses --checksum (instead of --ignore-times),
which is much more efficient on the server side - the server just needs
to
check the full-file checksum computed by the client, together with the
mtime,
nlinks, size attributes, to see if the file has changed. If you want a
more
conservative approach, you can change it back to --ignore-times, which
requires the server to send block checksums to the client.
- The use of rsync --checksum allows BackupPC to guess a potential match
anywhere in the pool, even on a first-time backup. In that case, the
usual
rsync block checksums are still exchanged to make sure the complete file
is identical.
- Uses rsync-3.0.9 on the server side (in place of File::RsyncP), with
C code layer to interface to the BackupPC storage. So the whole data
path for rsync is now in compiled C code, which is much faster than perl.
- Due to the use of rsync-3.X, acls and xattrs are supported, and other
useful options (but not all) are supported. Rsync protocol 30 supports
the efficient incremental file list, which significantly improves memory
usage and startup time. It also supports MD5 full-file checksums, which
match BackupPC's new digest. That allows a full-file digest to be
checked
as easily as an mtime on the server side.
- Significant portions of the BackupPC code are now compiled C code in a
new module called BackupPC::XS that is dynamically linked to perl.
- FTP Xfer method isn't supported yet.
development.
Unfortunately (for BackupPC development at least), I'm starting a new job
in a week, but I'll try to get an alpha release out before then. Hopefully
I'll still find the time to finish testing and bug fixing to produce an
official release at some point
I'm quite excited about the new architecture, features and performance.
While the pool and storage layouts have completely changed, it remains
backward compatible to existing installations (although more testing is
needed there).
I'll be traveling for a few days this week, but hopefully I'll get some
time to explain some of the new features in more detail. I've attached a
very short summary below.
There are a couple of areas where it would be helpful to get people's
suggestions too.
Craig
- No use of hardlinks (except temporarily to do atomic renames).
Reference counting
is handled at the application level.
- Backups are stored as "reverse deltas" - the most recent backup is
always filled
and older backups are reconstitued by merging all the deltas starting
with the
nearest future filled backup and working backwards.
This is the opposite of V3 where incrementals are stored as "forward
deltas"
to a prior backup (typically the last full backup or prior lower-level
incremental backup, or the last full in the case of rsync).
- Since the most recent backup is filled, viewing/restoring that backup
(which is
the most common backup used) doesn't require mergning any deltas from
other backups.
- The concepts of incr/full backups and unfilled/filled storage are
decoupled.
- Uses full-file MD5 digests, which are stored in the directory attrib
files.
Each backup directory only contains an attrib file.
- The Pool layout still support chains to handle md5 collisions. While
collisions
can be constructed and are now well-known, they are highly unlikely in
the wild.
Pool files are never renamed or moved, unlike V3.
- Any backup can be deleted (deltas are merged into next older backup if
it is
not filled).
- The reverse deltas allow "infinte incrementals" - no need for a full
backup
if you are willing to trade speed for the risk that a file change will
not be detected if the mtime or size don't change.
- An rsync "full" backup now uses --checksum (instead of --ignore-times),
which is much more efficient on the server side - the server just needs
to
check the full-file checksum computed by the client, together with the
mtime,
nlinks, size attributes, to see if the file has changed. If you want a
more
conservative approach, you can change it back to --ignore-times, which
requires the server to send block checksums to the client.
- The use of rsync --checksum allows BackupPC to guess a potential match
anywhere in the pool, even on a first-time backup. In that case, the
usual
rsync block checksums are still exchanged to make sure the complete file
is identical.
- Uses rsync-3.0.9 on the server side (in place of File::RsyncP), with
C code layer to interface to the BackupPC storage. So the whole data
path for rsync is now in compiled C code, which is much faster than perl.
- Due to the use of rsync-3.X, acls and xattrs are supported, and other
useful options (but not all) are supported. Rsync protocol 30 supports
the efficient incremental file list, which significantly improves memory
usage and startup time. It also supports MD5 full-file checksums, which
match BackupPC's new digest. That allows a full-file digest to be
checked
as easily as an mtime on the server side.
- Significant portions of the BackupPC code are now compiled C code in a
new module called BackupPC::XS that is dynamically linked to perl.
- FTP Xfer method isn't supported yet.