Discussion:
[BackupPC-devel] BackupPC 4.0 features - attribute file and backup storage
Fresel Michal - hi competence e.U.
2011-03-23 14:52:31 UTC
Permalink
hi

just signed onto that list :)
Conversely, I would like to raise a suggestion I mentioned a while
back with reference to 3.x. I think it would be great to have the
ability to mark a backup to be saved and not automatically deleted
based upon the expiry rules. Currently, I can fake it by renaming the
backup (+/- adding a symlink to the original name). But it would be
really nice to have an officially-supported convention that allows
individual backups to be protected. My recommendation would be to add
a suffix (e.g., .save) to the backup number. The particular use case I
have in mind is when you upgrade a system (or otherwise make major
changes) and specifically want to save the last backup of the
pre-upgrade version.
ACK - It would be great to mark a single one as "undeletable" by any kind cleanup-mechanims
Sometimes we create multiple backups in small timeframes (i. e before, during after some system changes)
Some kind of "protect this backup from deletion" would be really nice ....
Fresel Michal - hi competence e.U.
2011-03-23 15:42:50 UTC
Permalink
If you are changing the appended rsync digest format for cpool
files using rsync, I think it might be helpful to also store the
uncompressed filesize in the digest There are several use cases
(including verifying rsync checksums where the filesize is required
to determine the blocksize) where I have needed to decompress the
entire file just to find out its size (and since I am in the pool
tree I don't have access to the attrib file to know its size).
Alternatively, if you want the first time hack to work then you could
make the pool file name equal to: <md5sum>_<SHA-256sum> which would
still be smaller than SHA-512sum and I would wager that we are
unlikely ever to start seeing lots of files with simultaneous
collisions of the md5 and the SHA-256 checksums. In a sense, the
SHA-256 checksum would act like a unique chain suffix and since it
would always be there you never would have to actually decompress and
compare the files to see if a chain is necessary. Plus you then would
have two essentially independent checksums built into the file name.
i would propose to extend it to
<MD5>_<SHA_256>_NULL_<uncompressed_FILESIZE>
by default

and an option for (if user enables it :)
<MD5>_<SHA256>_<SHA512>_<uncompressed_FILESIZE>
maybe somebody wants to to recalculate the SHA512 sums afterwards (in idle time?) - therefore the "NULL" in the default name above

indeed ... this would generate very long filenames:

as for the name-length limit of 255
32_64_128_<filesize>
meaning there would be space left for 27 more characters (10^26)

so we could also append Filesizes of ... uuh ... wait ...
10^12 - Terabyte
10^15 - Petabyte
10^18 - Exabyte ....
well ... very big files :)

Having all kinds of checksums and sizes already calculated - these information may be reused for custom user-scripts like
# integration testing of pool using md5, sha256 AND sha512 :)
# appending .sha256 or sha512 files in archive-operations
# post-dump integrity tests on client ...

Greetings
Mike
Jeffrey J. Kosowsky
2011-03-23 18:40:31 UTC
Permalink
Post by Fresel Michal - hi competence e.U.
If you are changing the appended rsync digest format for cpool
files using rsync, I think it might be helpful to also store the
uncompressed filesize in the digest There are several use cases
(including verifying rsync checksums where the filesize is required
to determine the blocksize) where I have needed to decompress the
entire file just to find out its size (and since I am in the pool
tree I don't have access to the attrib file to know its size).
Alternatively, if you want the first time hack to work then you could
make the pool file name equal to: <md5sum>_<SHA-256sum> which would
still be smaller than SHA-512sum and I would wager that we are
unlikely ever to start seeing lots of files with simultaneous
collisions of the md5 and the SHA-256 checksums. In a sense, the
SHA-256 checksum would act like a unique chain suffix and since it
would always be there you never would have to actually decompress and
compare the files to see if a chain is necessary. Plus you then would
have two essentially independent checksums built into the file name.
i would propose to extend it to
<MD5>_<SHA_256>_NULL_<uncompressed_FILESIZE>
by default
and an option for (if user enables it :)
<MD5>_<SHA256>_<SHA512>_<uncompressed_FILESIZE>
maybe somebody wants to to recalculate the SHA512 sums afterwards (in idle time?) - therefore the "NULL" in the default name above
I don't see the advantage of having SHA256 and SHA512. Let users
choose one or the other. The only reason I proposed adding another
checksum is if people are worried about MD5 collisions. So the goal
would be to pick a 2nd checksum whether SHA256 or SHA512 or any other
choice that the user believes to be sufficiently unique.

Having the uncompressed filesize may be nice but it is not critical to
unique pool naming which after all is the purpose of the checksums.
Post by Fresel Michal - hi competence e.U.
as for the name-length limit of 255
32_64_128_<filesize>
meaning there would be space left for 27 more characters (10^26)
so we could also append Filesizes of ... uuh ... wait ...
10^12 - Terabyte
10^15 - Petabyte
10^18 - Exabyte ....
well ... very big files :)
Having all kinds of checksums and sizes already calculated - these information may be reused for custom user-scripts like
# integration testing of pool using md5, sha256 AND sha512 :)
# appending .sha256 or sha512 files in archive-operations
# post-dump integrity tests on client ...
Greetings
Mike
------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software
be a part of the solution? Download the Intel(R) Manageability Checker
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
BackupPC-devel mailing list
List: https://lists.sourceforge.net/lists/listinfo/backuppc-devel
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
Fresel Michal - hi competence e.U.
2011-03-23 19:11:09 UTC
Permalink
hi Jeffrey,

posted "Thinking of 4.0 - change of compression level" afterwards suggesting creation of some kind of ".info" file

the SHA256 and SHA512 checksums would be included in that file
so would be the uncompressed size

the "file_naming" change would thus be irrelevant
Post by Jeffrey J. Kosowsky
I don't see the advantage of having SHA256 and SHA512.
why not calculate them now (i.e when the server is idle?) to have it for future use?
who knows what rsync will be next year?
not within near future but: i.e. sha256 for blocks and sha512 for full file?
so we would have at least our full_file checksums present
Post by Jeffrey J. Kosowsky
Let users choose one or the other.
can be realized by that info-file
it's still the user's decission on what additional checksums are created ....
Post by Jeffrey J. Kosowsky
The only reason I proposed adding another
checksum is if people are worried about MD5 collisions. So the goal
would be to pick a 2nd checksum whether SHA256 or SHA512 or any other
choice that the user believes to be sufficiently unique.
not really worried about colission but about file-integrity on the server's pooled file + time to recheck

today it's quite common to privide all 3 of them when downloading via web ....
Post by Jeffrey J. Kosowsky
Having the uncompressed filesize may be nice but it is not critical to
unique pool naming which after all is the purpose of the checksums.
might be implemented on some kind of "info" file

Greetings

Mike

Loading...