We have two Windows 2003 Fileservers replicating several folders about 75K Folders, 95K files, about 6GB total. After a large update, I've noticed that the files stopped replicating. Looking at the event log, I noticed the following event:

view plain print about
1Event ID: 13506
2Source: NtFrs
3The File Replication Service failed a consistency check
4 (QKey != QUADZERO)
5in "QHashInsertLock:" at line 696.
6
7The File Replication Service will restart automatically at a later time. If this problem persists a subsequent entry in this event log describes the recovery procedure.
8 For more information about the automatic restart right click on My Computer and then click on Manage, System Tools, Services, File Replication Service, and Recovery.
9
10For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

Following this event, came Event ID 13555:

view plain print about
1Event ID: 13555
2Source: NtFrs
3The File Replication Service is in an error state. Files will not replicate to or from one or all of the replica sets on this computer until the following recovery steps are performed:
4
5 Recovery Steps:
6
7 [1] The error state may clear itself if you stop and restart the FRS service. This can be done by performing the following in a command window:
8
9 net stop ntfrs
10 net start ntfrs
11
12If this fails to clear up the problem then proceed as follows.
13
14 [2] For Active Directory Domain Controllers that DO NOT host any DFS alternates or other replica sets with replication enabled:
15
16If there is at least one other Domain Controller in this domain then restore the "system state" of this DC from backup (using ntbackup or other backup-restore utility) and make it non-authoritative.
17
18If there are NO other Domain Controllers in this domain then restore the "system state" of this DC from backup (using ntbackup or other backup-restore utility) and choose the Advanced option which marks the sysvols as primary.
19
20If there are other Domain Controllers in this domain but ALL of them have this event log message then restore one of them as primary (data files from primary will replicate everywhere) and the others as non-authoritative.
21
22
23 [3] For Active Directory Domain Controllers that host DFS alternates or other replica sets with replication enabled:
24
25 (3-a) If the Dfs alternates on this DC do not have any other replication partners then copy the data under that Dfs share to a safe location.
26 (3-b) If this server is the only Active Directory Domain Controller for this domain then, before going to (3-c), make sure this server does not have any inbound or outbound connections to other servers that were formerly Domain Controllers for this domain but are now off the net (and will never be coming back online) or have been fresh installed without being demoted. To delete connections use the Sites and Services snapin and look for
27Sites->
NAME_OF_SITE->Servers->NAME_OF_SERVER->NTDS Settings->CONNECTIONS.
28 (3-c) Restore the "system state" of this DC from backup (using ntbackup or other backup-restore utility) and make it non-authoritative.
29 (3-d) Copy the data from step (3-a) above to the original location after the sysvol share is published.
30
31
32 [4] For other Windows servers:
33
34 (4-a) If any of the DFS alternates or other replica sets hosted by this server do not have any other replication partners then copy the data under its share or replica tree root to a safe location.
35 (4-b) net stop ntfrs
36 (4-c) rd /s /q c:\windows\ntfrs\jet
37 (4-d) net start ntfrs
38 (4-e) Copy the data from step (4-a) above to the original location after the service has initialized (5 minutes is a safe waiting time).
39
40Note: If this error message is in the eventlog of all the members of a particular replica set then perform steps (4-a) and (4-e) above on only one of the members.
41
42For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

In our case, this is not a domain controller, so 4 applies. However, we can't afford to have any downtime since the servers are hosting production website content. I have done these type of restores in the past, and it takes a while to get the images from SVN (SubVersion), and then it takes many hours to replicate. Doing a little googling, I found this microsoft technote:

http://support.microsoft.com/?id=290762

I decided to pick server1 to be authorative, and performed an authorative restore (D4). In order to do that, I followed the instructions in the section titled "Authorative FRS Restore".

view plain print about
1Authoritative FRS restore
2Use authoritative restores only as a final option, such as in the case of directory collisions.
3
4For example, you may require an authoritative restore if you must recover an FRS replica set where replication has completely stopped and requires a rebuild from scratch.
5
6The following list of requirements must be met when before you perform an authoritative FRS restore:
71.    The FRS service must be disabled on all downstream partners (direct and transitive) for the reinitialized replica sets before you restart the FRS service when the authoritative restore has been configured to occur.
82.    Events 13553 and 13516 have been logged in the FRS event log. These events indicate that the membership to the replica set has been established on the computer that is configured for the authoritative restore.
93.    The computer that is configured for the authoritative restore is configured to be authoritative for all the data that you want to replicate to replica set members. This is not the case if you are performing a join on an empty directory. For more information, click the following article number to view the article in the Microsoft Knowledge Base:
10266679 (http://support.microsoft.com/kb/266679/) Pre-staging the File Replication Service replicated files on SYSVOL and Distributed File System shares for optimal synchronization
114.    All other partners in the replica set must be reinitialized with a nonauthoritative restore.
12To complete an authoritative restore, stop the FRS service, configure the BurFlags registry key, and then restart the FRS service. To do so:
131.    Click Start, and then click Run.
142.    In the Open box, type cmd and then press ENTER.
153.    In the Command box, type net stop ntfrs.
164.    Click Start, and then click Run.
175.    In the Open box, type regedit and then press ENTER.
186.    Locate the following subkey in the registry:
19HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NtFrs\Parameters\Backup/Restore\Process at Startup
207.    In the right pane, double click BurFlags.
218.    In the Edit DWORD Value dialog box, type D4 and then click OK.
229.    Quit Registry Editor, and then switch to the Command box.
2310.    In the Command box, type net start ntfrs.
2411.    Quit the Command box.
25When the FRS service is restarted, the following actions occur:
26"    The value for the BurFlags registry key is set back to 0.
27"    Files in the reinitialized FRS replicated directories remain unchanged and become authoritative on direct replication, and through transitive replication, indirect replication partners.
28"    The FRS database is rebuilt based on current file inventory.

So what I did was turned off the "File Replication Service" on server2, which I decided to make Non-Authorative. I made the registry change on server2, changing

view plain print about
1HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NtFrs\Parameters\Backup/Restore\Process at Startup\BurFlags=D2

Then I stopped the FRS service on server1, which I decided to make authorative. I set the same registry entry to D4

view plain print about
1HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NtFrs\Parameters\Backup/Restore\Process at Startup\BurFlags=D4

After starting the FRS service on Server1, the jet database in C:\Windows\ntfrs\jet\ntfrs.jdb shrank from 500mb down to 1mb and slowly started growing again.