TNCS-0025 – SmartScan
Created: March 31, 2015
Table of Contents
ToggleINTRODUCTION
ChronoSync v4.6 introduces a new feature called SmartScan. SmartScan is an advanced, heuristic algorithm that can greatly reduce the amount of time ChronoSync spends analyzing a pair of file systems for changes. File system analysis can be a time consuming process for many sync and backup operations. Despite all the advances in computing power over the years, file system traversal is still a relatively slow process. Add to that the fact that many such operations are performed over a network connection and you get even slower performance. SmartScan is designed to take the pain out of such operations.
IMPORTANT NOTE: As of macOS 12 (i.e. Monterey), the use of “Admin Access” is required if you wish to use the SmartScan feature in your syncs. This is a limitation of macOS itself. If you are running an earlier version of macOS, then “Admin Access” is not necessary in order to use SmartScan.
HOW DOES IT WORK?
SmartScan works by combining the file system events database, maintained by the macOS kernel, with the ChronoSync Difference Engine (CDE), the brains behind every sync and backup that ChronoSync performs. The file system events database, or fsEvents for short, contains a record of every file system operation that is performed. It is maintained on a per-volume basis and is quite detailed.
ChronoSync queries this database to see what portions of the file system have changed in between runs of a synchronization or backup. It uses the result of this query as a “hint” for the CDE so it can quickly locate and identify changes that have occurred. Without the CDE’s file system database as a reference point, the SmartScan algorithm would not be possible.
The performance improvements SmartScan offers can be drastic — such as a sync that used to take hours completing in mere minutes. Or the improvements can be negligible — it all depends on the structure of the file system being scanned and, more important, how those files are used in between syncs. There are also situations where SmartScan should not be used, so understanding all these factors is important. If ever there was a “your mileage may vary”-type feature, SmartScan is it!
UNDERSTANDING WHEN TO USE IT
When SmartSmart is Good
When there are a LOT of files to scan, your devices are relatively slow or the connection to those devices (bus or network) is relatively slow — or any combination thereof. ChronoAgent-based syncs almost always benefit from SmartScan and, the slower the network connection, the greater the performance boost! SmartScan is primarily intended to be used for excessively large syncs that normally take a long time to run due to a very slow analysis pass. Such syncs can take hours (or even days) to run but with SmartScan, the time can be reduced to minutes (or maybe just hours).
When SmartSmart is Not-So-Good
In general, there is no harm in enabling SmartScan. The worst that will happen is that you might not see dramatic speed benefits. If you’re running small syncs (< 100,000 files) and/or have exceptionally fast hardware, you will see minimal performance gains. Also, if your changes are spread out evenly across your sync hierarchy, ChronoSync will still have to access a large portion of your file system.
SmartScan works by telling ChronoSync what parts of your file system it can avoid, so if it can’t avoid any of it, there’s no performance gain. Also, if the amount of data copied is typically very large, you may not see significant performance gains. SmartScan may effectively reduce the scanning time but if that time is small relative to how long it takes to copy the files, the net benefit may be negligible.
UNDERSTANDING WHEN NOT TO USE IT
If your target volume is transported between machines, you probably should avoid SmartScan. In theory, it will still work but the chance that the fsEvents database on that volume will contain invalid or corrupt information increases when the drive becomes promiscuous.
If the volume is used on other operating systems, including versions of macOS prior to 10.8, SmartScan should definitely be disabled in the ‘Special File/Folder Handling’ section of the Sync Document Options Panel. Important: One not-so-obvious example of moving a volume between systems is if a single computer reboots into different partitions that run different operating systems. This includes using Boot Camp to run Windows.
SmartScan should also be avoided if your sync set experiences a very high number of changes (e.g. 1,000,000+ file-system modifications) in between synchronizations. This can be caused by a very active system (such as a server) or very long periods of time in between syncs. In such scenarios, the overhead of analyzing the fsEvents database may be so high that it erases any performance gain that SmartScan has to offer (scheduling frequent synchronizations can mitigate this). There are also certain types of filesystem activity that can cause SmartScan a little trouble. See the Pitfalls section, below, for a discussion about this.
PITFALLS
SmartScan is a heuristic algorithm which means it produces a result which is good enough most of the time but not necessarily perfect. Its weakness is that IF the fsEvent database does not accurately reflect the changes that have been made to a file system, then SmartScan will not know where to look for changes. The accuracy of the fsEvents database is rarely in question and ChronoSync has measures in place to identify and deal with such inaccuracies. Of greater concern is misleading information.
The fsEvents database must be chronologically traversed and the events contained therein must be evaluated relative to the state of the file system at the time the change was made, not necessarily the current state of the file system. Again, ChronoSync has measures in place to do just this but, unfortunately, there is a certain type of event known as a phantom folder replacement that cannot be resolved.
A phantom folder replacement event begins when a change occurs to a branch of the file system but that branch no longer exists when your next sync or backup is run. Normally, this would not be a problem since, well, the folder branch no longer exists! But what if it no longer exists because it has been moved? The fsEvents database captures every change that occurred within the folder branch but the move operation is not captured in enough detail to deduce what happened, thus those changes are obscured.
Again, this is usually not a problem because the change to the destination of the move operation is detected, and ChronoSync would pick up any new folder branch that appeared there. But here is the tricky situation that triggers the event — what if it replaced a near-identical folder branch when it was moved? That is, one with the same name and similar structure as the original? And what if the replacement folder had exactly the same modification date as the folder it replaced? This is a phantom folder replacement!
The situation where a phantom folder replacement occurs is one where SmartScan really has to show some intelligence. The standard SmartScan algorithm employs extra checks specifically looking for such phantom folder replacements. Unfortunately, it doesn’t always get it right. Luckily, such types of file system events are exceptionally rare in typical real-world use but they are still a possibility.
To combat this reality, SmartScan employs a failsafe mechanism to recover from situations where the fsEvents database may have misled it. Basically, its trust in fsEvents is short lived. For any given folder, if fsEvents consistently fails to report any changes within that folder, SmartScan begins to doubt what fsEvents is telling it. Eventually, it will double check just to make sure. After doing this, its trust in fsEvents is restored – but only for a little while!
Note: the default package file handling mechanism that ChronoSync employs will naturally mitigate the most common type of phantom folder replacement issue that you are likely to encounter: a package file replacing a near-identical package file. Thus it is recommended that you do NOT enable the dissect packages setting on SmartScan-based syncs. Doing so bypasses this natural defense mechanism.
SmartScan’s primary purpose is to reduce the amount of time it takes for very long syncs and backups to be performed. These are operations that are normally so burdensome that they are run at relatively infrequent intervals. Internet-based syncs are prime examples, often going from several hours down to several minutes after enabling SmartScan. Local area network syncs can also qualify — they may not take several hours to run but they do impose a burden on the network and thus may get relegated to infrequent and/or off-hour intervals. Even local hard drive syncs can qualify when you consider file counts running in the millions is not uncommon.
SmartScan makes it practical to run these syncs frequently and the pitfalls associated with a misleading fsEvents database are rarely, if ever, an issue.
ASYMMETRIC MODE
SmartScan is enabled via a checkbox in the File/Folder Handling section of the Options panel. It is enabled on a per-sync document basis so you can decide which syncs/backups will benefit from SmartScan and which ones you choose to leave SmartScan disabled. Even when enabled, there is no guarantee that SmartScan will actually be used for the synchronization. The underlaying devices must support the fsEvents mechanism and ChronoSync must be able to access the fsEvents database stored on each target volume. In general, locally attached hard drives and hard drives accessed through a ChronoAgent connection will qualify. File servers and removable media will not qualify.
It’s possible that SmartScan will be operational on one target but not the other. For instance, backing up your local hard drive to a file server will result in SmartScan being enabled on the local hard drive but not the server. This is indicated by “SmartScan active” being displayed in the target information pane of the Setup panel. This situation is referred to as asymmetric mode, meaning SmartScan is working on one side of the sync but not the other. Asymmetric SmartScan, while not as beneficial as a Symmetric SmartScan, can still offer significant performance improvements.
Asymmetric SmartScan can also be forced by the user. As mentioned above, if SmartScan is enabled and your target volume/device qualifies, SmartScan will be enabled for that target. However, you can invoke Options for that target and turn SmartScan off. This allows you to force asymmetric mode on a per target, per sync document basis.
Why would you want to enable asymmetric smart scan? Well, there are a few legitimate cases where you’ll want to do this. For example, suppose you are synchronizing your internal hard drive with an external hard drive that is frequently moved between systems. This violates the rule that SmartScan shouldn’t be used with a volume that is transported between machines. However, your internal hard drive isn’t going anywhere. You can enable SmartScan for your internal HD but disable it for the external, thus remaining perfectly safe.
Another situation is when you may be concerned about the phantom folder replacement issue described earlier in this tech-note. If you are backing up data that you are concerned may experience such phantom folder replacements, you may choose to disable SmartScan on that target. However, if your destination volume is purely for backup, it should experience no such changes at all and thus you can safely enable SmartScan on that side of your backup. This allows you to at least partially experience the benefits SmartScan has to offer.
AGGRESSIVE MODE
If you invoke Options for a target which qualifies for SmartScan, you will notice that not only can you turn SmartScan off for that target, you can also enable what is known as aggressive mode. Enabling aggressive mode causes SmartScan to operate even faster than its standard variant. However, it does so at the expense of disabling the extra checks that look for phantom folder replacement events. You would only want to do this for syncs or backups of data sets which you know are not subject to phantom folder replacement events. Doing so results in even faster file system scanning.
Note: The phantom folder replacement failsafe mentioned above is always in effect, even for aggressive mode.
REVISION HISTORY
Oct-24-2022 – Added note about the required use of “Admin Access” on macOS 12, and later.
Mar-31-2015 – Created from Internal Support Notes.