Jan Kratochvil
Projects Products GIT Resume Contact
Projects
UNIX UNIX-devel Web Amiga MS-Windows MS-DOS Patches
Captive: The first free NTFS read/write filesystem for GNU/Linux

 

Previous document Parent Next document

NT Cache Manager

Although there exist some 3rd party documents about NT Cache Manager W32 subsystem such as The NT Cache Manager Description or Learn About NT's File-system Cache they are definitely insufficient for compatible NT Cache Manager reimplementation.

NT Cache Manager is about mapping filesystem objects such as regular file data, filesystem bitmap or journalling zone (log file). It is also being used by the filesystem for mapping of virtual volume files representing the whole underlying filesystem device.

The original W32 NT Cache Manager is much more complicated as it must coordinate its effort with other W32 subsystems like mapping of executable files (ImageSectionObject), insufficient system resources from NT Memory Manager or general effort to perform caching features for system performance.

NT Cache Manager of this project has much simpler goal - it just needs to provide compatible NT Cache Manager functionality while the other goals of its W32 counterpart are left to be successfuly handled by UNIX OS in much more efficient way.

NT Cache Manager Architecture
NT Cache Manager Architecture

 

Cache Manager objects are always bound to FCB (File Control Block). FileObject (or its associated HANDLE) serve only as reference to FCB and there can be multiple FileObject/HANDLE items for one FCB. It is a bit misleading you must use FileObject pointer while calling most of the Cache Manager functions.

Before using any other Cache Manager functions you must first call CcInitializeCacheMap(). You must give the maximum mapped object offset. Each mapped object byte must have at most one mapped memory location - no shared pages are allowed. Also any subsequent mapping request is expected to be mapped into continuous memory region. It implies you must reserve the memory region for possible future mapping during the initial CcInitializeCacheMap() moment sized according to the given maximum mapped object offset. This is the approach currently implemented by this project although it cannot be used for 3rd party ext2fsd.sys driver as it initialized Cache Manager by the whole media device size and it surprisingly succeeds for original Microsoft Windows Cache Manager. I expect the space reservation should be postponed to the first mapping request and expect no multiple mappings will be done in the case of memory-exceeding CcInitializeCacheMap() reservation request. CcSetFileSizes() changing the reserved memory area size may assume no existing Map or Pin mappings exist. Only in the case of FO_STREAM_FILE (virtual device file) it is permitted to extend mapped size even in the case of existing (and dirty) Map or Pin mappings.

PCACHE_MANAGER_CALLBACKS argument can be safely ignored:

AcquireForReadAhead()/ReleaseFromReadAhead()

As any readahead functionality is optional these entries are never used by Cache Manager implementation of this project.

AcquireForLazyWrite()/ReleaseFromLazyWrite()

Even the write-behind functionality is optional for Cache Manager. It is being done in asynchronous way in the original Microsoft Windows Cache Manager. implementation and it is ignored by Cache Manager implementation of this project.

Cache Manager does not need to write any data if not explicitely requested by the driver. It is even expected to silently drop any pending dirty data blocks during filesystem shutdown. Forced dirty block write by function CcFlushCache() should be written without any wrapping surrounding AcquireForLazyWrite()/ReleaseFromLazyWrite() pair.

CcUninitializeCacheMap() is just a suggestion for Cache Manager that driver will no longer reference given SharedCacheMap. The uninitialization can be postponed to any later moment in original Microsoft Windows Cache Manager as it may be locked by existing ImageSectionObject of some file being executed etc. It is fatal to destroy SharedCacheMap in the moment you see no other references to it as the driver will access it for some moment even after CcUninitializeCacheMap(). I am not sure if it is a bug of the driver or whether there are some rules how long after CcUninitializeCacheMap() completion given SharedCacheMap still exists. Fortunately it is safe to never destroy SharedCacheMap and leave it leaked - everything gets clean in the sandboxed environment soon anyway.

There exist Map and Pin type objects for each SharedCacheMap although they look very similiar. Only these objects give you access to any memory data — SharedCacheMap only reserved the space to ensure continuous mapping of the forthcoming mappings but it did not map any data into it.

Mapping of 'new' Map or Pin will create the new object only in the case no such mapping exists now. Otherwise you will just get the reference to the existing object with increased usecount.

Map

Map mapping is always at most one for each SharedCacheMap. Base offset/length of such mapping have no meaning as there can be only single Map.

Apparently Map size can be arbitrary long according to its SharedCacheMap reserved space.

You cannot modify the memory mapped by Map in any way. As it is the same memory area (address) as the pages used by Pin objects you always access the last modified version by possible Pin of the same page.

Pin

Pin mapping always represents just one physical page (PAGE_SIZE – 4096 for i386). Its base offset/length can be safely extended to be aligned to the requested page.

Pin can have associated pair of oldest and newest LSN (Linear Sequence Number). It can be set by CcSetDirtyPinnedData() and Cache Manager always tracks the lowest and highest reported LSN for each page. LSN is assumed to be 0 if not set.

Any existing Pin mapping will be reused for further mappings as long as it is not ThreadOwned. In the moment you use CcSetBcbOwnerPointer() you will detach the associated Pin pages from its SharedCacheMap. Although they will further act as valid Pin mappings they will be no longer reused during new Pin mapping of the same page. There can exist multiple Pin mappings of the same page (although sharing the same memory space). This detaching must be implemented even in the single-threaded W32 implementation of this project as it is affecting the behaviour of Cache Manager. It was never seen how to behave if multiple dirty Pin mappings of the same page exist.

Only the pages not yet present in the memory must be read from the disk. You must not read any pages you do not need to as the driver does not expect it and it would corrupt its data buffers. There is just a strict difference between CcPinRead() and CcPinMappedData() function calls where CcPinRead() is required to re-read its data blocks even if they were currently already Map mapped (unless it was already also Pin mapped at least once). On the opposite side CcPinMappedData() must not re-read the given blocks, moreover it blocks are required to be already Map mapped by the caller.

Cache Manager of this project will destroy Pin or Map mappings after their last unreferencing (in opposite of leaked SharedCacheMap). Despite it any dirty pages may still be held as the pages (including their LSNs) are cached associated with SharedCacheMap. It may be also possible original Microsoft Windows Cache Manager postpones Pin mapping destroy to later time but it does not matter.

TraceFS NT Cache Manager Tracer

Cache Manager behaviour would be hard to analyze just by reverse engineering as it is pretty complicated code cooperating with many other W32 kernel subsystems. It was chosen as easier way to trace it instead and validate all the Cache Manager assumptions by Cache Manager simulator.

TraceFS Hooking
TraceFS Hooking

 

You must prepare your driver to be hooked (ntfs.sys in this case):

./src/TraceFS/hookfs.pl ntfs.sys ./src/TraceFS/TraceFS-W32/TraceFS.sys >hooked/ntfs.sys

This hooked/ntfs.sys file must be replaced in the %System32%\drivers directory. Beware as Microsoft Windows has many backups of these system files such as %System32%\dllcache — delete them all!

You also need to install ./src/TraceFS/TraceFS-W32/TraceFS.sys into %System32%\drivers directory and import TraceFS/TraceFS-W32/TraceFS.reg registry file to initialize the debug driver during system boot.

You can now pray a bit and snap the resulting Cache Manager tracing from WinDbg by W32 remote kernel debugging:

Successfuly connected WinDbg
Successfuly connected WinDbg

 

The resulting trace file should be processed by ./src/TraceFS/checktrace.pl Perl Cache Manager implementation to validate its assumptions about Cache Manager behaviour. Any seen incompatibilies will be reported — your target is to reach as few error messages as possible.

KNOWN BUGS: Combination of message synchronization primitives and implemented refusal to create journalling thread of ntfs.sys causes fatal system lockup in several advanced operations such as setting compression attribute. Despite it more common operations can be successfuly traced during the whole Microsoft Windows session including its final shutdown and such traces provide enough material to be food to ./src/TraceFS/checktrace.pl Perl Cache Manager validator.

TraceFS for general API tracing

Although TraceFS was up to now used only for tracing of NT Cache Manager it can be easily used ever for any other NT kernel API tracing. You need to provide appropriate function wrappers in the main source file ./src/TraceFS/TraceFS-W32/TraceFS.c. Original system functions being wrapped should be called with their original name. Your wrapping functions should have the first letter of their name replaced by character 'T'. Therefore wrapping of CcInitializeCacheMap() must be done by your function TcInitializeCacheMap(). Prototypes of both the wrapping and wrapped functions must be the same. You must also export all the wrapped functions by ./src/TraceFS/TraceFS-W32/TraceFS.def. ./src/TraceFS/hookfs.pl has no hardcoded function names – it will hook exactly the exported entries.

Framework for thread synchronizations and debug tracing is provided to prevent mangling of messages while running by multiple threads at once. Testing was done just on uniprocessor machine, SMP kernel may need some fixes.

 

 

Previous document Next document

EOF