TNCS-0027 – The Package Conundrum
Created: May 4, 2015
Table of Contents
ToggleINTRODUCTION
When macOS was released back in 2001, it introduced a new type of file system object in addition to regular files and folders — the “package”. A package is really a folder that contains numerous files and sub-folders arranged in a very specific way. All the files and folders within a package are closely related to each other. The operating system’s user interface presents these packaged folders as a single file, allowing them — and all their contents — to be manipulated as a unit.
Manipulating packages as a unit is very important because — as stated above — all the files and folders within a package are related to each other in a very specific way. If the contents of packages were to be exposed to the user, they could get renamed, replaced or removed, ultimately corrupting the package file’s integrity. For example, the most common type of package that a user is likely to deal with is the application package. All apps in the Applications and Utilities folders are actually packages. If the contents of these packages were somehow “messed with”, the application would become corrupt and unusable.
Because maintaining the integrity of packages as a whole is very important, ChronoSync works very hard to continue to maintain them as a unit. By default, packages are presented in ChronoSync’s user interface — and processed during synchronization — as a unit. This means that if any change — major or minor — is detected within the the package, ChronoSync will copy the package in its entirety during a sync or backup operation. This behavior is very unique to ChronoSync — other backup and sync utilities will treat packages as folders, potentially corrupting their contents if something goes wrong while processing them or when archiving/restoration is attempted.
THE CONUNDRUM EXPLAINED
While maintaining packages as a unit is the proper thing to do, it does have downsides. This is because it is not uncommon for some packages to be downright enormous in size. For example, the Pages application has over 30,000 files and folders in its package. The XCode application has over 150,000! This makes detecting changes within a package — and copying those changes as part of a sync — a potentially unwieldy task. Think about this: a user with 150,000 files and folders stored in their Documents folder can be said to have a lot of files, yet a single package can shatter that number!
While application packages may be enormous in size, their contents change infrequently, so manipulating them as entire packages is not too burdensome. Unfortunately that cannot be said about the growing number of document file formats that are implemented as packages. Programs such as Aperture, iMovie, iPhoto and now Photos, store their libraries within package files. While the file counts within these packages may not be enormous, the size of the data certainly can be.
Herein lies the conundrum: ChronoSync, while doing the right thing and treating packages as a unit, will end up copying an enormous amount of data if a large package file experiences even minor changes. For instance, if your iPhoto library is 120 GB in size and you launch iPhoto to simply re-arrange photos in an event, ChronoSync will see that relatively minor change and end up copying all 120 GB of data. This happens even though the actual changes within the package may only amount to a few hundred bytes of information. This is inefficient at best and downright frustrating at worst!
SOLUTION
To combat this scenario, ChronoSync has always offered the “Dissect Packages” option. This option — when enabled — simply means that packages are to be treated as regular folders. ChronoSync will no longer treat packages as a unit and will only process the specific changes that occur within a package. This makes dealing with large packages much more efficient at the expense of no longer treating packages as a unit. The user will simply have to be careful when synchronizing, restoring and archiving contents of packages. This, of course, is a less than ideal situation, but a “necessary evil” to make large packages manageable.
To combat this problem, ChronoSync v4.6.1 introduces a new technique for dealing with packages: merging. This combines the behavior of standard package handling and dissection. The way it works is that packages continue to be treated as a unit. Any change detected within the package will mark the entire package as modified and ChronoSync will copy the entire package as part of a sync or backup operation.
However, when copying the entire package, ChronoSync will actually compare its contents with the package file on the destination that is being replaced. Only the components of the package that have actually changed will be copied. The rest will be reconstructed on the destination using hard links — a mechanism that allows multiple files on disk to actually refer to the same data. The result is that packages are copied much faster than before, especially when the changes within the packages are relatively minor. While merging these minor changes is still not quite as fast as the dissect packages option, it has the advantage of continuing to maintain the package’s integrity as a unit.
As mentioned above, package merging is available on ChronoSync v4.6.1 and later. It is also only available when the destination volume supports hard links. All locally mounted HFS+ volumes qualify as does any ChronoAgent connection that targets a remote HFS+ volume. Unfortunately, most forms of file sharing (which includes NAS devices) will not qualify. You can still choose merging as your package handling option but ChronoSync will fall back to standard package handling when it determines that the destination volume doesn’t support hard links. In these cases, you will still have to resort to package dissection when dealing with very large package files.
REVISION HISTORY
May-04-2015 – Created from Internal Support Notes.