Benchmarking fitting programs
One of the most important aspects of simulation and fitting is a reassurance that the program you are using to fit the data gives accurate results. Therefore, an attempt was made to compare similar programs for analysing reflectometry data.
The test was split into two different parts, simulation and fitting. In the first part we test if using the same model in different programs gives the same curve. In the second part simulated experimental noise is added to the theoretical curve, assuming that the acquisition obeys Poisson counting statistics. All the programs are given the same starting model and are then asked to fit the data.
These were chosen as some of the most popular programs used at the moment, but is obviously not a complete list.
Simulation of data
In the first part a theoretical curve was generated in the Motofit program, from a two slab model. The model consists of:
- top phase: air
- 1st layer: thin layer with SLD close to air
- 2nd layer: thick layer with SLD roughly in between that of air and substrate
- substrate: SLD = 20x10-6 Å-2.
There were some ground rules:
- No resolution smearing was employed, as different programs have different implementation.
- Absorption effects were ignored
- The critical edge was scaled to 1
- A fine slab spacing was selected for those programs with a fine slab treatment of roughness (Reflfit).
All comparisons were performed on a log scale, after transforming the output from the fitting programs. The data is displayed as a residual deviation from Motofit output, as a function of Q. Note that the accuracy of Motofit is also directly tested in this way (inaccuracy would be indicated by two or more other programs showing a similar systematic deviation).
There are no systematic residuals from the other programs, which indicates that the reflectivity calculation in Motofit is accurate.
Output from Parratt is very similar to Motofit, and the residuals are low, indicating accurate reflectivity calculation.
Output from drydoc shows no systematic deviations from Motofit output, indicating accurate reflectivity calculation. There is more noise in the residual plot, but this may be due to the output from Drydoc being single precision.
Output from Reflfit shows systematic deviation from Motofit, Parratt and Drydoc. One possible cause is the method that Reflfit uses for approximating interfacial roughness. Each interface is finely divided into 'mini' slabs, whose scattering length density is used to create a diffuse interface with a gaussian profile (31 slabs were chosen in this simulation). This approach is different to the Nevot and Croce approximation used by Motofit, Parratt and Drydoc (which is also a gaussian interface, but which may be inaccurate if too large a roughness is used in comparison to layer thickness). This systematic deviation is probably not significant because the differences are not larger in magnitude than those observed for other programs. The reflectivity calculation from Reflfit is as accurate as the other programs.
Fitting of data
The simulated data then had experimental noise added to the theoretical curve, assuming that the acquistion obeys Poisson counting statistics. Each program was then tasked to fit the data, using the same initial starting parameters. These starting parameters were chosen randomly before the fitting process was started and were deliberately chosen to be quite close to the final solution.
Notes were made of Parameter and Χ2 values after fitting. A value of Χ2 close to 1 indicates a good fit. If the program did not converge to the final solution this was treated as a weakness in the fitting / optimisation algorithm. The majority of the programs employ derivative type methods, which can sometimes miss the best fit if the initial guesses are not close to the final solution. Methods such as Genetic Optimisation or simulated annealing should be able to avoid this problem, as they are more likely to find global minima. The performance of these algorithms was not investigated as relatively few programs use these techniques.
Pass Parameters within one standard deviation of the simulated parameters. Χ2 = 1.11
Fail Parameters outside three standard deviations of simulated parameters. Χ2 = 167
On changing the fit parameters to those obtained with Motofit Χ2 decreased to 1.11 The default optimisation in Reflfit seems poor. However, if the user is skilful, he may be able to obtain the global minimum.
Pass A couple of parameters did not refine to within the standard deviation. Χ2 = 8.22 The background cannot be fitted in Parratt, it was fixed at the simulated value. This may lead to difficulties on real datasets.
Fail The program did not refine to the minimum. It was not possible to examine Χ2 within the program.
Please note that under different starting conditions different results may be obtained. However, the results clearly indicate that programs without Genetic Optimisation or Simulated annealing, etc, are at a significant disadvantage.