Help With Least Squares Fits
This help file is divided into five parts.
Getting Started provides a quick guide to performing a
least squares fit using Datasqueeze. It will be useful for new users who want
immediate gratification, or experienced users who have not used this feature for a
while and need a quick reminder.
Advice on Least Squares Fits provides a longer description of
least squares fitting, including numerous warnings about possible pitfalls.
You should definitely read this section before showing the results of any
least squares fits to your boss or advisor!
Technical Details provides a bit more information about
what goes on "under the hood." It will be primarily of interest to those who
have previously used (or even written) least-squares fitting algorithms and
want to understand in more detail how the program works.
Finally, Functions provides a synopsis of each of the
fitting functions currently available.
In this section it is assumed that you are already familiar with the
important windows and controls in Datasqueeze. If you have questions about
the basic controls, look in General Information under the Help menu.
The general procedure for performing a least-squares fit in
Datasqueeze is as follows:
- Open a data file in the usual way, either with (cmd)O or by
going to the File panel.
- Make a line plot of your data in using the Plot panel. You can
plot versus any parameter you want--for example, plot intensity versus Q or
versus Chi. However, for some models (e.g. a Rayleigh lineshape), only plotting versus
Q makes much sense.
- The fit panel allows you to construct a model function
as a sum of provided subfunctions. You specify the number of subfunctions,
and what you will use for each. For example, if you wanted a Gaussian
peak with a linear background, you would select 2 for the Number of Functions, Polynomial
for the first subfunction and Gaussian for the second subfunction.
- Some functions (e.g. the Rayleigh function for small-angle scattering from spheres) expect
Q as the independent variable. You can use these models even if you have plotted data versus
2theta or Q2 by checking the
"Use Q as Fit Variable" box. This will be true for all submodels used, even
those such as Gaussian which do not necessarily require Q as the independent parameter.
Do not check this box if you want to be sure that the variable you used for plots
to be that used for the fit itself.
- Once you are
happy with the way you have set things up, click on the Construct button to
bring up the Fit Parameter panel. "Peak" type functions (Gaussian, Lorentzian, etc.) are
special in that the Fit Parameter shows an "area" which is not an independently variable parameter,
but is calculated from the amplitude and width.
- If you have not previously used a model, Datasqueeze sets the starting parameters
to reasonable values, consistent with the way the data have been plotted. For example, the
default parameters for a Lorentzian peak will place the peak somewhere in the region of your
plot, with an amplitude comparable to the scale of your data and a width that is a small fraction
of the total width of your plot. However, the starting peak position, amplitude, and width will
still probably be nowhere near those of the actual features in your data. Therefore, you need
to set each parameter to a reasonable starting value. This is important--the fitting algorithm (as with any least-squares algorithm)
will only work if your starting parameters are reasonably close to the "true" values.
Setting initial values can be done in two ways:
- Most fit models come equipped with cursors, which allow you to graphically
change some or all of the fit parameters. You can drag the cursors around until
the calculated shape resembles that of your data.
You may need to click Reset to display the cursors the first time.
- For parameters that do not have corresponding cursors, or for finer control, type
into the appropriate box in the Fit Parameter panel to set each parameter to a reasonable starting value.
After doing this, click the Apply button.
Your function will be plotted on top of the data in the Line Plot Image window.
See whether your model with the starting parameters you have chosen
agrees approximately with the data.
- It is better not to let all the parameters vary simultaneously, unless you
have a particularly simple model. Rather, you should pick two or three parameters,
vary them, and then gradually add more. So, at this point you should check off
a few parameters that you want to vary.
- Click on the Fit button to vary the checked parameters. If you have chosen
wisely, the agreement between model and data has improved, and the
Message area in the Fit Panel does not show an errors. If so, go back to step
7, check a few more parameters, and continue until all the parameters that
you wish to vary have been optimized. (Note that you do not have to
minimize every parameter--if you have other knowledge about some parameters you
may prefer to hold them fixed). If you get unexpected or unwanted results,
you may want to deselect a few parameters, click on the Revert button,
and try again.
- If you keep getting error messages, you may wish to click the Correlations
button and see whether two or more parameters are strongly correlated.
A correlation coefficient with magnitude greater than +/- 0.9 is not good,
and a correlation coefficient with magnitude greater than +/- 0.98
almost certainly indicates a real problem. In this case you may need to
either use different starting parameters or hold more parameters fixed.
- The one-parameter error bars give an indication of the uncertainty in
each parameter. A better estimate is provided by checking the
"Calculate MPEB?" box and redoing the fit. You may however find that
some parameters are strongly enough correlated that multi-parameter error
bars cannot be calculated, in which case you need to deselect those parameters
and redo the fit without varying them.
- If you are happy with the results, you may want to
print the plot showing the agreement between model and data (starting from the File
menu),
or save the fitting parameters in an ascii file
(again starting from the File
menu).
How Least-Squares Fitting Works
Suppose you have measured some quantity y as a function of some
independent parameter x. You have done this at N different points,
so you have N pairs of values (xi, yi), each
with an uncertainty in yi given by ei.
You believe that you can describe this with a model f(x) which contains
M independent parameters bj. You want to find the values of
bj which provide the best agreement between the model of the data.
The "goodness of fit" parameter that you use to describe the agreement is
phi:
phi = sumi=1..N ((f(xi) - yi)/ei)
2
The goal is therefore to find the set of parameters bj which
minimize phi. There are different algorithms for accomplishing this,
but they all rely in some way on taking numerical or analytical derivatives of
phi with respect to each of the bj, and then iteratively
adjusting the values of each of the bj until the minimum in
phi is found.
The problem can be compared to the case of a lost hiker. The hiker tries to
find his way back to civilization by always heading downhill, i.e., heading towards
his best guess for the "minimum in alitude." This is problem in minimizing one
parameter (the altitude) in two dimensions (the two-dimensional surface of the Earth).
The difference in least-squares fits is that the parameter space in general contains
more than two parameters.
The least-squares algorithm used by Datasqueeze, like most others, has the following properties:
- It finds a local minimum in phi, which is not necessarily the global minimum.
Thus, if inappropriate starting parameters are chosen, the final fitted values of the
parameters may not provide a good fit to the data.
- Since derivatives and other functions are calculated numerically, there is
always some numerical uncertainty in the final values. However, this uncertainty
is normally much less than the statistical uncertainties in these parameters.
- Since the program cycles through a finite number of iterations, it is possible
that it may never find the local minimum, particularly if the function is
quite insensitive to some of the parameters.
What Chi-Squared Means
The goodness-of-fit parameter obviously depends on the number of data points.
A better parameter to describe the agreement between model and data is
chi2. Definitions of chi2 vary, but the definition
used here is:
chi2 = phi / (N - M)
where, again, N is the number of data points and M is the number of
parameters that were varied. (This may be different from the total number of
available parameters if some of them were not allowed to vary.)
Remembering that each data point has a statistical uncertainty ei, we
can see that if the model describes the data "perfectly" an average calculated
value f(xi) will differ from the "true" value by ei,
and chi2 will be close to 1.
If chi2 is much greater than 1, it can mean one of several things:
- Although your model may visually resemble the data in a plot, there are still
statistically significant differences between the two.
- You have underestimated your error bars.
Datasqueeze assumes that the error bars are all given by Poisson statistics, that is, that
the uncertainty in a data point is approximately the square root of the number of counts.
This is a fairly good assumption for photon-counting detectors, but may be a terrible
assumption for other types, such as image plates.
If chi2 is much less than 1, it also can mean one of several things:
- You have too many fitting parameters, so that your model is actually "tracking the
noise."
- You have overestimated your error bars (see above).
In general, you hope that chi2 decreases as you minimize more
and more parameters. If it stays the same or increases, it means that any
improvement in phi is statistically insignificant. And, obviously, if the
number of data points is less than or equal to the number of fitting parameters then
the fit has no meaning.
What the Error Bars Mean
Since there is uncertainty in the data, there are obviously uncertainties in the
fitted parameters. There will always be a spread in parameters which will give
a result that is statistically indistinguishable from the "best" result.
Datasqueeze reports two kinds of error bar: single-parameter and multi-parameter.
The single-parameter error bar for a parameter bj can
be calculated analytically if we know the
first and second derivatives of phi with respect to bj.
(The first derivative should, of course, be zero at the minimum).
It is assumed here that, near the minimum, phi increases quadratically
as each fitting parameter moves away from its optimum value. The single-parameter error
bar is the calculated spread in the parameter which will result in a one-standard-deviation
disagreement with the data, taken as a whole.
That is, we estimate that if we remeasured the data many different times, we would
get a fitted value of that parameter +/- the error bar 68% of the time.
This assumes that all of the parameters are
independent of each other.
It turns out that increasing some bj by its 1-sigma uncertainty
should have the effect of increasing phi by a factor:
phimax = phibest * (1 + 1/(N - M))
The multi-parameter error bar for a parameter bj takes into
account the fact that parameters are not all independent, so that, if one parameter
bj is varied by a certain amount, the effect on the function can
to some extent be "corrected for" by simultaneously changing a different parameter
bk. Thus, the multi-parameter error bars for a parameter are
usually (but not always) somewhat greater than the single-parameter error bars, and
sometimes much greater. Datasqueeze calculates the low and high ranges for a parameter
by numerically testing how much the parameter can be changed before phi
increases from phibest to phimax. This probably
provides a better estimate of the true uncertainty in that parameter.
What Can Go Wrong
Beginners are sometimes misled by the fact that every scientific calculator includes a
linear-regression feature into thinking that they can always trust the results of a
least-squares fit. However,
given the above considerations, it is not surprising that least-squares fits do not
always yield expected (or correct) results. Here are some pitfalls to watch out for:
- False Minima: In the high-dimensional space of the parameters bj,
there may be many minima in phi. Datasqueeze just finds the one that is the
closest to your starting parameters. This is analogous to the hiker who is lost on a
volcano and finds his way to the bottom of the crater in the middle. He is at a local minimum
in altitude,
but nowhere near civilization. You should always make sure that your fit looks good--i.e.,
visually agrees with the data. If the outcome is really important to you, you should
probably try a range of starting parameters and verify that you always end up at or near
the same place.
- Strongly-Coupled Parameters:
If two parameters are strongly coupled (i.e., if they do almost exactly the same thing to the
function), the fit may not converge even after many iterations. Hopefully, when this
happens, you will get an appropriate warning in the Message Area. Consider, for example,
the following extreme case:
f(x) = A x2.00000 + B x2.00001
Clearly, the two terms in this function do almost the same thing, and we would have to have a
huge range in x before we could claim to have independently determined the
values of A and B. You can check if two parameters are strongly coupled
by clicking the Correlations button and looking at the values of
the correlation coefficients.
- Too Many Parameters: This is related to item #2. Suppose, for example,
that we have a peak that is well described by a single Gaussian function, with an
ampltitude, position, and width, but we believe that it really consists of two unresolved
peaks. If we start with two Gaussian peaks and a total of six independent variables,
the fit is almost certain to fail--if we are lucky, we will get a complaint about
strongly correlated parameters, but it is also possible that the program will
find a minimum based on noise in the data, with meaningless parameters. Such a fit
might make sense if the parameters were restricted, for example by fixing the
values of both widths such that there were only four fitting parameters, but extreme
care must be taken.
- Parameters That Have No Effect On The Fit: Suppose that we have a diffraction
peak that is "really" described by a function:
f(x) = A exp( ( (x - x0) / d)2)
with
A = 1000,
x0 = 0.1,
d = 0.01
and we try to describe it with the correct function but starting parameters
A = 1000,
x0 = 0.5,
d = 0.001
This peak is far too sharp, and at the wrong wrong position.
The function is essentially zero every place that the data are nonzero.
The
fitting algorithm will never find its way to the right minimum. This is why it is
crucial to have good starting values for all parameters.
Tricks for Avoiding Problems
- Make sure that your starting parameters are well chosen. Use different starting
parameters and click on the Apply button several times until the curve agrees at least
approximately with the data before trying to minimize anything.
- Check the Message Area after each fit to make sure that nothing went wrong,
rather than just proceeding blindly and trusting the parameters that pop up.
- In general, vary as few parameters as possible. If chi2 does not
improve when you allow a parameter to vary, then that parameter is not having any statistically
significant effect on the fit, and you should hold it fixed at some sensible value.
If two parameters are very strongly correlated, at least one of them should be fixed.
- Start off by varying just one or two parameters, then allow more and more to vary.
That way, you are less likely to have the program drift off into a parameter space of
unphysical values.
Statistical Errors
Datasqueeze assumes that the number in each pixel represents the actual number of photons
counted, so that the uncertainty is taken to be the square root of the value of that pixel.
This is probably close to correct for wire and CCD detectors, less so for some other
technologies such as image plates. Nevertheless, the overall effect is to weight
intense data more than weak data, and final fitted results turn out to be
remarkably insensitive to the exact algorithm chosen.
More precisely, if the "Sum" option is chosen in the Plot panel, then the
independent variable yi is the sum of all pixels that lie within
that bin, and the error is taken to be ei=sqrt(yi)
unless yi=0, in which case ei=1.
If the "Average" option is chosen in the Plot panel, then for a data point with
ni pixels in the chosen range the independent variable is
the sum of all pixel intensities divided by ni,
and the error is taken to be ei=sqrt(yi)/ni
unless yi=0, in which case ei=1.
In either case, the "weight" for least squares fits is one over the error,
wi=1/ei.
Note that there is one place where this algorithm fails badly. If you have
subtracted one data set from another (which is an allowed way to read in the data)
then there may be many pixels for which the original data sets had lots of intensity
but the difference pattern is small.
Least Squares Minimization Algorithm
Datasqueeze uses the Marquard nonlinear least-squares minimization algorithm
(D. W. Marquardt, J. Soc. Ind. Appl. Math. II, 2, 431-441 (1963)).
The code was originally written in C, and was tested extensively at the
Massachusetts Institute of Technology and the University of Pennsylvania. Due
to the similarity between C and Java, a minimum number of changes were required
to incorporate it into Datasqueeze.
One-Parameter Uncertainties
Suppose we have N data points yi, each with uncertainty
ei and weight wi=1/ei.
At each point we have calculated a function fi which depends on
M independent parameters bj. Then we define, as above,
phi = sumi=1..N ((f(xi) - yi)* wi)
2
and
chi2 = phi / (N - M)
In the process of minimizing phi with respect to the parameters bj,
we will have calculated the derivative of fi (at each point)
with respect to bj.
Then we define
ajk = sumi=1..N (d fi / d bj )
(d fi / d bk ) wi2
i.e.
ajj = sumi=1..N (d fi / d bj ) 2
wi2
Here the derivatives (d fi / d bj ) are of course partial
derivatives; Datasqueeze calculates then numerically.
It can then be shown (see, e.g.,
P. R. Bevington and D. K. Robinson, Data Reduction and
Error Analysis for the Physical Sciences, Third Edition, McGraw Hill (2003))
that the uncertainty in parameter bj is given by
sigmaj = sqrt(chi2 / ajj)
This is how the one-parameter uncertainties are calculated. Note that there is an
implicit assumption that chi2 is quadratic in each of the
bj; i.e., we are using the second term in a Taylor expansion.
Multi-Parameter Uncertainties
The one-parameter uncertainties are calculated by taking a partial derivative of
the function with respect to each of the bj. This implicitly
assumes that the parameters are uncorrelated. As discussed in
Bevington, if phibest is the
value of phi obtained when all parameters have been optimized, then
changing a given parameter to
bj -> bj +/- sigmaj
should cause phi to increase to
phimax = phibest * (1 + 1/(N - M))
assuming that no other parameters are varied.
The multi-parameter error bar for a parameter bj takes into
account the fact that parameters are correlated, so that, if one parameter
bj is varied by a certain amount, the effect on the function can
to some extent be "corrected for" by simultaneously changing a different parameter
bk. Thus, the multi-parameter error bars for a parameter are
usually (but not always) somewhat greater than the single-parameter error bars, and
sometimes much greater. Datasqueeze calculates the low and high ranges for a parameter
by setting it to a sequence of different values, allowing all other parameters to vary,
until phi
increases from phibest to phimax.
Parameter Correlation Coefficients
The parameter correlation coefficients cjk
indicate how strongly two parameters are
coupled. If cjk=1 then parameters j and k do exactly
the same thing to the model; if cjk=0 then they are completely
independent. The correlation coefficients are defined as follows: first we define
(as above)
ajk = sumi=1..N (d fi / d bj )
(d fi / d bk ) wi2
Then the cjk are essentially normalized values of the ajk.
They are calculated approximately as follows:
cjk = ajk / sqrt(ajj akk)
except that care has to be taken if ajj ≤ 0. (In practice this
means that there is a problem anyhow, because if one of the ajj is
zero then that parameter has no effect on the function).
The following is a synopsis of each of the fitting functions currently provided:
- Polynomial.
A cubic polynomial. May be useful to describe a slowly varying background.
f = (Const) + (Lin) (x - XC) + (Quad) (x - XC)2 + (Cub) (x - XC)3
If abs(result) would be > 1030, returns 1030.
You should not vary all parameters simultaneously--normally hold Xcen fixed.
This model has four control cursors: one for XC and (Const), and one each for the linear, quadratic,
and cubic terms. In Batch mode the model name is "Polynomial".
- Lorentzian.
A Lorentzian Peak Function. Often used to describe diffraction maxima from fluids.
f = (Ampl) kappa2 / ((x - pos)2 + kappa2).
kappa is the half-width at half-maximum.
Area under peak is Ampl * pi * kappa.
If kappa < 10-15, returns zero.
If abs(pos) > 1015, returns zero. For structural analysis the independent variable is
normally q, not 2-theta.
This model has two control cursors, one for the peak position and amplitude and one for kappa.
In Batch mode the model name is "Lorentzian".
- Gaussian. A Gaussian Peak Function. Often used to describe Bragg peak shapes.
arg = (x - pos) * sqrt(ln(2)) / delta
f = (Ampl) exp(-arg2)
delta is the half-width at half-maximum.
Area under peak is Ampl * sqrt(Pi/ln(2)) * delta
If abs(delta) < 10-10, returns zero.
If abs(pos) > 1015, returns zero.
If abs(arg) > 7, returns zero.
For structural analysis the independent variable is normally q, not 2-theta.
This model has two control cursors, one for the peak position and amplitude and one for delta.
In Batch mode the model name is "Gaussian".
- Voigt.
The model used in Datasqueeze is technically a "pseudo-Voigt" lineshape (the
weighted average of a Lorentzian and a Gaussian) rather than a true Voigt lineshape
(the convolution of a Lorentzian and a Gaussian, which takes substantially longer to
calculate). This function is often used as an empirical lineshape for Bragg peaks.
arg = (x - pos) / delta.
f = (Ampl) * (alpha / (1 + arg2) + (1 - alpha) * exp(-arg2 * ln(2)))
delta is the half-width at half-maximum.
Area under peak is Ampl * delta * (alpha * pi + (1-alpha) *sqrt(Pi/ln(2)))
See Gaussian, Lorentzian Functions for overflow limits.
Note that odd things may happen if alpha << 0 or alpha >> 1.
For structural analysis the independent variable is normally q, not 2-theta.
This model has two control cursors, one for the peak position and amplitude, one for delta, and
one for alpha.
In Batch mode the model name is "Voigt".
- Lorentzian^2. A Squared Lorentzian. Sometimes useful to parametrize
oddlyshaped peaks or beam zero scattering.
f = (Ampl) (kappa2 / ((x - pos)2 + kappa2))2
kappa is the half-width at quarter-maximum.
Area under peak is Ampl * pi * kappa / 2
If kappa < 10-15, returns zero.
If abs(pos) > 1015, returns zero.
For structural analysis the independent variable is normally q, not 2-theta.
This model has two control cursors, one for the peak position and amplitude, and one for kappa.
In Batch mode the model name is "Lorentzian^2".
- Power Law. A Power Law Function. May describe small-angle scattering or fluctuation-limited peaks.
f = (Ampl) | x - pos |alpha
If argument diverges, returns (Ampl) * 1020.
For structural analysis the independent variable is normally q, not 2-theta.
Note that setting the parameters (either with the parameter boxes or with cursors), and
visually comparing the agreement between model and data, are much better done using a
log-log plot than one with a linear scale.
This model has two control cursors, one for the amplitude, and one for the power law exponent.
In Batch mode the model name is "Power Law".
- Radius of Gyration
A Gaussian function describing small-angle scattering from a compact object with radius of gyration Rg
f = (Ampl) exp(- (x Rg)2 / 3 )
To be meaningful, the independent variable should x=q, not 2-theta. Note that setting the parameters
(either with the parameter boxes or with cursors), and visually comparing the
agreement between model and data, are much better done using a "Guinier plot" of log(intensity) versus q2
rather than one with a linear scale. In general, the plot scale has to be selected more carefully for this
model than for some others. If the Guinier plot is not linear, you are probably outside the range of
validity of the model.
This model has one control cursor, which controls the amplitude and the radius of gyration.
In Batch mode the model name is "Radius-Gyration".
- Sine Wave. A sine wave. Might be useful to describe azimuthal variation of a Bragg ring.
f = (Ampl) sin((phase) + x * (freq))
Argument of sine is in degrees, not radians
This model has two control cursors, one for the amplitude and phase, and one for the frequency.
In Batch mode the model name is "Sine Wave".
- Rayleigh. Rayleigh Function.
Describes small-angle scattering from random dilute suspension of spheres, which possibly
have polydisperse radii. Effective in version 2.2.4, the Gaussian distribution of radii
was replaced by a log-normal distribution, which has a number of advantages. (It reduces to a
Gaussian distribution in the limit of small dispersion, but never results in negative radii).
The quoted value of "sigma" is still the variance in the radius R.
The bare function is given by:
bare function = | 3. (sin(q R) - (q r) cos(q R))/(q R)3 |2
For SAXS analysis the independent variable should be q, not 2-theta.
This model has one control cursor, which determines the amplitude and mean radius.
In Batch mode the model name is "Rayleigh".
- Core-Shell. Core-Shell:
The Core-Shell model is often used to describe nanoparticles with a spherical core
and a spherical shell of a different electron density. If Rcore is
the radius of the core, Rshell the radius of the shell, rhocore the electron density in the
core, rhoshell the electron density in the shell, and rho0 the density in the surrounding
medium, then
it is easily calculated that the scattered intensity should be proportional to
f = | (rhoshell - rho0)Rshell3 phi(q Rshell) +
(rhocore - rhoshell) Rcore3 phi(q Rcore) | 2
phi(u) = 3. (sin(u) - u cos(u))/(u)3
(Note that prior to version 3.0.4 the factors of R3 were not included in the model).
The fitted densities are actually not the true electron densities, but rather
the density differences between the scattering particle and the medium. That is,
Rcore=( rhocore- rho0) and Rshell = (rhoshell - rho0).
Note that these densities are strongly coupled to the overall amplitude prefactor,
so it is not possible to simultaneously fit "Ampl", "Rcore", and "Rshell".
The electron density of water is rho0=0.334 e-/A3.
As with the Rayleigh model, dispersion in the sphere radius is incorporated by numerically integrating over a log-normal
distribution of radii.
(If the dispersion is zero, just the bare function is returned). sigma is taken to be the dispersion in
Rcore, with the ratio Rshell / Rcore held fixed during the integration.
Confusing and unphysical results may be obtained
if Rshell < Rcore (but there is no problem in having one or both of the electron densities negative).
For SAXS analysis the independent variable should be q, not 2-theta.
This model has one control cursor, which determines the amplitude and mean radius.
In Batch mode the model name is "Core-Shell".
- Ellipsoid.
Describes small-angle scattering from a random dilute suspension of ellipsoids of
revolution, with axes 2R, 2R, and 2vR, where v is the aspect ratio. Calculated
by integrating over spherical coordinates. Heterogeneity is incorporated by numerically integrating over a
log-normal distribution of sphere radii.
(If the dispersion is zero, just the bare function is returned). The "bare function" is
f = integral0pi/2 phi2(qR cos2 theta + v2 sin 2
theta) cos(theta) d theta
where
phi(u) = 3. (sin(u) - u cos(u))/(u)3
For SAXS analysis the independent variable should be q, not 2-theta.
Note that the aspect ratio and dispersion parameters are strongly coupled;
for best results you should start with good guesses and vary as few parameters as possible,
letting one additional parameter vary at a time.
See A. Guinier and G. Fourner, "Small-Angle Scattering of x-rays", p. 19, Wiley and Sons, 1955.
Since a multi-dimensional integral must be calculated at each point, it takes longer to
evaluate this function than some of the others, and for this reason it was found impractical to
incorporate control cursors.
In Batch mode the model name is "Ellipsoid".
- Thin Rod.
This function describes small-angle scattering from a random dilute suspension of rods of
infinitesimal transverse dimension and length L. This function is so smooth that nothing is gained
by Gaussian smearing. The function is:
f = (Si(q L)/(qL)) - (sin2(qL/2)/(qL/2)2)
Si(x) == integral0x (sin(t)/t) dt
For SAXS analysis the independent variable should be q, not 2-theta.
See A. Guinier and G. Fourner, Small-Angle Scattering of X-rays, p. 20, Wiley and Sons (1955).
This model has one control cursor, which determines the amplitude and rod length.
In Batch mode the model name is "ThinRod".
- Thin Disk.
This function describes small-angle scattering from a random dilute suspension of flat disks of
infinitesimal thickness and radius R. This function is so smooth that nothing is gained
by Gaussian smearing. The function is:
f = (2 / q2R2)(1 - J1(2 q R)/(q R) )
For SAXS analysis the independent variable should be q, not 2-theta.
This model has one control cursor, which determines the amplitude and disk radius.
In Batch mode the model name is "ThinDisk".
- Cylinder.
This function describes small-angle scattering from a random dilute suspension of uniform-density
cylinders (rods or disks) of radius R and height h.
It is calculated via a numerical integration over spherical coordinates.
Heterogeneity is incorporated by numerically integrating over a log-normal distribution of radii, with
the ratio h/R kept constant. (If the dispersion is zero, just the bare function is returned).
The "bare function" is
f = h2 R4 integral0pi/2 ( sin2((q h / 2) cos theta) / ((q h / 2) cos theta)2 )
( 4 J12(q R sin theta) / (q R sin theta)2 ) sin theta d theta
where J1 (u) is the Bessel function of the first kind of order 1.
(The prefactor of h2 R4 was added in version 3.0.4, and does
not change anything except the fitted amplitude).
Note that if the aspect ratio v=h/R is either very large (resulting in a long thin rod)
or very small (resulting in a thin disk), then the function is very smooth and little is
changed by incorporating nonzero dispersion, and in these cases very similar results
are expected from the Thin Disk or Thin Rod models, which can be calculated much more quickly.
Oscillations are typically only observed if the aspect ratio is in the range 0.01<v<100. Note also
that parameters tend to be strongly coupled; for best results you should start with good guesses and
vary as few parameters as possible, letting one additional parameter vary at a time.
åFor SAXS analysis the independent variable should be q, not 2-theta.
See A. Guinier and G. Fourner, "Small-Angle Scattering of x-rays", p. 19, Wiley and Sons, 1955.
Since a multi-dimensional integral must be calculated at each point, it takes longer to
evaluate this function than some of the others, and for this reason it was found impractical to
incorporate control cursors.
In Batch mode the model name is "Cylinder".
- Coated Cylinder.
This model extends the cylinder model to describe scattering from coated or functionalized cylinders,
as might be found in assemblies of vesicles or nanoparticles.
It is thus conceptually similar to the core-shell model often used to
describe scattering from coated spheres. The physical model consists of the following:
- A central core disk of radius Rcore, height Hcore, and electron density
ρcore=Rhocore + rho0, where rho0is the electron density of the medium
(often water, rhowater=0.334 e-/A3).
The scattering amplitude from the core for vector components qz=q cos theta, qr=q sin theta is:
s1 = Rhocore hcore Rcore2
( sin ((q hcore / 2) cos theta) / ((q hcore / 2) cos theta) )
( 2 J1 (q Rcore sin theta) / (q Rcore sin theta) )
- A ring of inner radius Rcore, outer radius Rcore+Tside, height Hside,
and electron density rhoside =Rhoside + rho0 The scattering amplitude from the side ring is
s 2 = Rhoside hside ( sin ((q hside / 2) cos theta) / ((q hside / 2) cos theta) )
x ( Rside2 ( 2 J1 (q Rside sin theta) / (q Rside sin theta) )
-Rcore2 ( 2 J1 (q Rcore sin theta) / (q Rcore sin theta) ) )
If Hside is set to be negative it is forced to be equal to Hcore
- Two "caps" of radius RcapA, centered on the core disk, that extend from
z=±Hcore/2 to ±(Tcap+Hcore/2), with density
rhoCapA =RhoCapA + rho0. The scattering amplitude from these caps is:
s3 = RhoCapaA T CapaA R CapaA 2
( sin ((q hcapA / 2) cos theta) / ((q hcapA / 2) cos theta) )
( 2 J1 (q capAcore sin theta) / (q RcapA sin theta) )
2 cos ( q (hCore+TCapA) / 2 )
If RcapA is set to be negative then it is forced to be equal to Rcore.
- Two more caps of radius RcapB and thickness TcapB, centered on the core disk, that extend from
z=±Hcore/2+TcapA to ±(TcapA+Hcore/2+TcapB), with density
rhoCapB =RhoCapB + rho0. The scattering amplitude from these caps is
s4 = RhoCapaB T CapaB R CapaB 2
( sin ((q hCapaB / 2) cos theta) / ((q hCapaB / 2) cos theta) )
( 2 J1 (q CapaBcore sin theta) / (q RCapaB sin theta) )
2 cos ( q (TCapA + (hCore+TCapB) / 2 ) )
If RcapB is set to be negative then it is forced to be equal to Rcore.
The scattered intensity is then calculated by doing a spherical average over the square of the
summed amplitudes:
f = integral0pi/2 ( s1 +s2 +s3 +s4 )
2 sin theta d theta
Note also that parameters tend to be strongly coupled; for best results you should start with
good guesses and vary as few parameters as possible, letting one additional parameter vary at a time. It is not possible to vary the amplitude prefactor and all of the densities at the same time.
For SAXS analysis the independent variable should be q, not 2-theta.
Since a multi-dimensional integral must be calculated at each point, it takes longer to
evaluate this function than some of the others, and for this reason it was found impractical to
incorporate control cursors. In Batch mode the model name is "Coated Cylinder ".
xxx
- Gaussian Coil.
This function describes small-angle scattering from a flexible polymer
chain which is not self-avoiding and obeys Gaussian statistics. The function is:
f = 2 * (exp(-u) + u - 1) / u2
u == q2Rg2
Rg is the radius of gyration.
For SAXS analysis the independent variable should be q, not 2-theta.
For the original calculation see P. Debye,
J. Phys. Colloid Chem. 51, 18-23 (1947).
This model has one control cursor, which determines the amplitude and radius of gyration.
In Batch mode the model name is "Gaussian Coil".
- Fractal Aggregate.
This function describes a model for small-angle scattering from fractal aggregates of spheres. The bare function is
f = S(q)|F(q)|2
S(q) = 1 + (D Gamma(D-1) * sin((D-1) arctan(q xi))) /
(q R)D ( 1 + 1/(q 2 xi2) )(D-1)/2
F(q) = 3. (sin(q R) - (q r) cos(q R))/(q R)3
Here D is the fractal dimension of the system, R is the radius of the individual spheres, and xi
represents the characteristic distance above which the mass distribution is no longer described by a fractal law.
Gamma is the Gamma function. Note that the model assumes
2 ≤ D ≤ 3 and xi > R; unexpected and unphysical results may be obtained if these conditions are not met.
For SAXS analysis the independent variable should be q, not 2-theta.
See J. Texeira, J. Appl. Cryst. 21, 781-785 (1988) and also and also J. S. Pederson in
Neutrons, X-rays, and Light:
Scattering Methods Applied to Soft Condensed Matter, P. Lindner and Th. Zemb eds, Elsevier (2002), pp. 391-420.
This model has two control cursors, one determining the amplitude and R and the other determining xi. In
Batch mode the model name is "Fractal Aggregate".
- Bessel. Bessel function of the first kind of order n.
f(q) = (Ampl) * jn(q R).
n is rounded to the nearest integer, and should not be fit.
This model has one control cursor, which determines the amplitude and radius R.
In Batch mode the model name is "Bessel".
- Bessel^2. Bessel function of the first kind of order n, squared.
f(q) = (Ampl) * (jn(q R))^2.
n is rounded to the nearest integer, and should NOT be fit.
This model has one control cursor, which determines the amplitude and radius R.
In Batch mode the model name is "Bessel^2".
- Yarusso-Cooper. Yarusso-Cooper Model. Function sometimes used to describe
scattering from ionomers or micelles. Rayleigh form factor with hard-sphere correlations.
f = (Ampl) Phi2(q R1) / (1 + 8 vca Phi(2 q RCa)/vp)
Phi(u) = 3. (sin(u) - (u) cos(u)))/(u)3\n
vca = 4 pi RCa3/3
Here R1 is the radius of the scattering object (assumed to be a sphere of uniform electron density),
RCA is the distance of closest approach of two scatterers, and Vp is the mean volume per particle of a scatterer.
Note that unphysical results will be obtained if the sphere volume corresponding to rca is ≥ vp. In fact,
any packing fraction greater than about 0.75 is unphysical, and the YC model should work best in a regime
even more dilute than that.
For SAXS analysis the independent variable should be q, not 2-theta.
See D. J. Yarusso and S. L. Cooper, Macromolecules 16, 1871 (1983).
This model has two control cursors, one for the amplitude and radius R1, and one for the volume vp.
In Batch mode the model name is "Yarusso-Cooper".
- Kinning-Thomas. Kinning-Thomas Model.
Commonly used function to describe scattering from ionomers or micelles.
Rayleigh form factor with Percus-Yevick correlations.
R=radius of the high-density central sphere.
RCA=radius of closest approach.
n = volume density of spheres
f = (Ampl) Phi2(q R) / (1 + 24 eta G(A)/ A)
Phi(u) = 3. (sin(u) - (u) cos(u)))/(u)3
eta = 4 pi RCA3 n / 3
Note that for strange and unphysical results will be obtained if eta ≥ 1. Like the Yarusso-Cooper
model,
any packing fraction greater than about 0.75 is unphysical, and the KT model should work best in a regime
even more dilute than that.
A = 2 Q RCA
G(A) = (alpha / A2)(sin A - A cos A) + (beta / A3) (2 A sin A + (2 - A2) cos A - 2)
+ (gamma / A5) (-A4 cos A + 4[(3 A2 - 6) cos A + (A3 - 6 A) sin A + 6])
alpha = (1 + 2 eta)2/(1 - eta)4
beta = -6 eta (1 + eta/2)2/(1 - eta)4
gamma = (eta / 2)(1 + 2 eta)2/(1 - eta)4
For SAXS analysis the independent variable should be q, not 2-theta.
See D. J. Kinning and E. L. Thomas, Macromolecules 17, 1712 (1984).
This model has two control cursors, one for the amplitude and radius R1, and one for the packing fraction eta.
In Batch mode the model name is "Kinning-Thomas".
- Percus-Yevick. Percus-Yevick Hard Sphere Structure Factor.
Describes scattering from ideal hard spheres (no form factor).
f = (Ampl) / (1 - rho c(Q d))
c = - 4 pi d2 integral01 (ds s2 (sin(Q d s)/(Q d s))(alpha + beta s + gamma s2)
eta = volume fraction of scatters (0 ≤ eta ≤ 1)
(eta ≥ 0.75 is unphysically high packing)
d = 2 r = diameter
rho = number density = 3 eta / 4 pi rho r3
alpha = (1 + 2 eta)2/(1 - eta4)
beta = - 6 f (1 + eta/2)2/(1 - eta)4
gamma = (eta / 2)(1 + 2 eta)2/(1 -eta)4
See J. K. Percus and G. J. Yevick, Phys. Rev. 110, 1 (1958);
N. W. Ashcroft and J. Lekner, Phys. Rev. 145, 83 (1966).
This model has two control cursors, one for the amplitude and radius R1, and one for the packing fracion eta.
In Batch mode the model name is "Percus-Yevick".
Last modified February 6, 2015
email:
support@datasqueezesoftware.com
|