Smoothing Methods for Numerical Differentiation to Identify Electrochemical Reactions from Open-Circuit-Potential Data

reactionstakeplacewithinintercalationelectrodes.Weemploy open-circuit data from the Chevrolet Bolt EV negative and positive electrodes, lithiated graphite and NMC (Ni 0.6 Mn 0.2 Co 0.2 O 2 ), respectively. Electrochemical reactions correspond to peaks in plots of d x /d U vs. U , where U is the electrode potential and x is the fractional state of charge (0 ≤ x ≤ 1). We employ three nonparametric methods in this work to smooth and subsequently differentiate the data: a windowed cubic polynomial of ﬁxed and variable window length and a cubic smoothing spline. The standard deviation in the measurements can be used to set appropriately the window length and the analogous smoothing parameter for the spline. The cubic polynomial with a ﬁxed window length and the smoothing spline each have a single smoothing parameter; both methods are shown to work well and yield comparable results. The best results were achieved with the cubic polynomial incorporating a varying, adaptive window length, and the increased ﬁdelity enables a completely automated procedure to identify reactions. The analysis clariﬁes the multi-reaction nature of lithiated graphite, which must be addressed in cell modeling to simulate fast-charge behavior. © The Author(s) 2018. Published by ECS. This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 License (CC BY, http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse of the work in any medium, provided the original work is properly cited. [DOI: 10.1149/2.0951816jes]

The importance of differential voltage spectroscopy (DVS) is well explained in three recent publications. Pastor-Fernández et al. 1 overview the utility of incremental capacity-differential voltage (termed DVS in this work a ) to characterize degradation phenomena in battery systems. The authors characterize the state-of-health of the batteries by examining the shift in peaks obtained from differentiating the open-circuit voltage (OCV) curves U(x) at low, constant-current operation (C/25 in their investigation, where the 1C rate corresponds to the discharge current that will discharge the cell in 1 hour). Related to work described in Ref. 1, the utility of DVS for informing life models of lithium ion batteries, including peak shifting associated with coupled chemical degradation and fatigue mechanics, is examined in detail for a graphite/iron-phosphate cell in Ref. 2. When such detailed model-experiment comparisons are made, it is helpful to smooth the data consistent with the amount of noise, and thereby enable the robust construction of plots requiring the differential dx/dU (or the inverse, dU/dx). The smoothing process facilitates the fitting of model parameters to the differentiated data. Li et al. 3 review the DVS literature and apply differentials of U(x) curves for deducing state-of-health and state-of-charge of lithium ion cells; they employ a Gaussian filter to smooth the U(x) curves. At this time, it remains an open question as to the merits of a Gaussian filter relative to those of the methods described in this work. Last, Zheng et al. 4 review the DVS literature and, like Refs. 1 and 3, emphasize the utility of smoothed U(x) curves for deducing the state of lithium ion cells and algorithms derived by modeling the differential data for battery management systems.
While smoothed U(x) curves are useful for the reduced-order modeling for estimating state-of-health and state-of-charge, they are also important for the modeling of porous electrodes with transport relations based on irreversible thermodynamics. 5,6 We can write the flux of lithium-filled sites N within an insertion or alloy electrode as 7,8 [ 1 ] where N H is the flux of vacant sites, c T is the total concentration of sites available for lithium, and D 0 is the diffusion coefficient of lithium in the limit as x tends to zero (a condition under which Fickian diffusion prevails). Faraday's constant, the universal gas constant, and absolute temperature are given by F, R, and T, respectively. From Eq. 1, we can see that an accurate model is needed for dU/dx if the flux expression is to be accurate. In addition to describing lithium diffusion within insertion or alloy electrodes, it is important to know how many electrochemical reactions are needed to represent electrode behavior, a fact that is often overlooked in the modeling of intercalation electrodes, as it is common to assume that a single electrochemical reaction, described by a Butler-Volmer equation, prevails: this approach conflicts with the observation of multiple peaks observed in DVS. A model incorporating multiple lithium species and associated reactions has recently been developed 10 to rectify this situation. The different species of lithium correspond to different galleries in the host materials and were shown to be consistent with X-ray analyses. The single Butler-Volmer equation must then be replaced with an electrochemical reaction expression for each lithium species determined. Here again, an accurate determination of dx/dU from DVS data plays a critical role in the process. For linear-sweep voltammetry, 9 current is proportional to dx/dU at low currents; hence, we can associate peaks in DVS plots (dx/dU plotted against U) with current peaks in voltammograms (plots of current vs. potential, with the potential being scanned linearly with time). One must ascertain how many significant peaks there are in the DVS plots, corresponding to electrochemical reactions, to model accurately voltammetry experiments 9 using information obtained only from DVS experiments. More discussion on the similarity of DVS at very low currents to cyclic voltammetry at low potential scan rates can be found in Ref. 10. In this context, Figure 1 provides a schematic overview of one of the objectives of this work. To identify electrochemical reactions, we determine when d 2 U/dx 2 = 0, as will be described, requiring a second numerical differentiation of the data, and placing a premium on smoothing the data but retaining sufficient fidelity in the data features.
This work is concerned with the process of determining a differentiable function U(x) from DVS data. To do this, one must filter out noise in the data, because unfiltered noise impacts not only the function U(x), but its derivatives even more so.
Methods for analyzing experimental data can be classified as parametric or nonparametric according to whether or not they are based on a model that depends on physicochemcial parameters of the system being studied. When a well-established mathematical model is available that captures the significant physical phenomena occurring in the experiment, it is appropriate to regress unknown parameters in the model. Good agreement between the regressed model and the experimental data lends credence to the belief that the model is appropriate, and studies of the sensitivity of the experiment to the parameters can help quantify the confidence in the parameters found by regression. However, if one applies an inappropriate model, especially if that model is "over-parameterized," a good agreement in the regression might still be possible, leading to a false belief in the model that may not stand up when faced with new evidence.
Nonparametric methods stand in contrast to the parametric approach. A nonparametric method attempts to extract functional relationships from the data using minimal prior information, typically limited to basic assumptions on the smoothness of the functions and the accuracy of the experimental apparatus. Supposing that two experimental variables have a functional relation, say y = μ(x), Eubank 11 offers the following perspective: ". . . the use of parametric methods can be quite dangerous in situations where there is little known about the regression function. In such cases, the bulk of the information about μ lies in the data rather than the person conducting the study or experiment. Accordingly, it seems more reasonable to use inferential techniques which rely heavily on the data. It is for this reason that nonparametric regression techniques are ideally suited to problems of inference when the available knowledge about μ is limited in nature." In the work presented here, we analyze experiments on battery electrodes and use nonparametric methods to extract a smooth opencircuit voltage curve and its derivative. Features of the smoothed function identify where electrochemical reactions occur as the electrode is charged or discharged. In previous work, 10,12 a parametric model was used to fit the open-circuit voltage curve to data, requiring prior knowledge of how many reactions are occurring. The methods discussed here require no such prior knowledge. As discussed above, the nonparametric methods provide a basis for using the parametric model and assessing its accuracy.
Intercalation and alloy electrodes that contain many phases are difficult to model, and many parameters appear in, for example, porous electrode models. 5,6,13 Recently, the difficulty in isolating meaningful values led us to perform a sensitivity analysis to examine the impact of the 20 physicochemical parameters used in the modeling of a lithium. 12 More information that allows for clarification of individual electrode reactions would be of high value in terms of formulating models for porous electrodes comprising intercalation and alloy active materials. It has been previously shown (see Ref. 10 and references therein) that reactions in electrochemical systems can be identified by differentiation of the potential U of an electrode (relative to that of a reference electrode) with respect to composition (e.g., fractional state of charge x, with 0 ≤ x ≤ 1); i.e., plots of dx/dU vs. U, as embodied in the terminology "differential voltage spectroscopy" (DVS). For the extraction of dx/dU from the common experiments that involve a low but constant current, numerical differentiation of the potential trace is required. Thus, we seek to understand features in plots of dx/dU vs. U, and that is the subject of this work.
Differentiation of data is a challenging endeavor that has a long history. The most common approach can be broken down into two parts: first, a function is fit to the data, and, second, the fit function is differentiated. Polynomial regression estimators are routinely employed as fitting functions. Eubank (section 3.5 of Ref. 11) reviews the history of polynomial regression and points out the motivation for the method based on theoretical underpinnings: Taylor's Theorem and the Weierstrass Approximation Theorem can be used to bound the error of a finite order polynomial regression estimator. This does not mean the error will be small, however, particularly for a lower order polynomial estimator. Regression with a lower order polynomial (e.g., a cubic polynomial) can be expected to work well for lowfrequency components of data, and, if the data is primarily composed of low-frequency information, a smooth and useful fitting function is rendered. Attempts to modify lower order polynomial estimators to address the remainder terms (e.g., high-frequency components) have led to the development of smoothing splines 11,14,19 and least-squares spline estimators. 11,19 Conte and de Boor quantify the error in the differentiation of a cubic interpolation spline, and they specifically underscore the utility of such splines for differentiation of data (section 6.7 of Ref. 15). Richardson et al. 16 show the utility of a cubic interpolation spline for machine learning purposes with comparison to neural networks and fuzzy systems. Feng 17 discusses the utility of a cubic smoothing spline (for equally spaced data points) in the context of data differentiation and shows that a smoothing spline is "a lowpass filter with the maximum flatness property," a statement that is further explored in that work. Russell and Norvig 18 overview the use of nonparametric regression in the context of agent (or machine) learning, which requires the repeated use of feature recognition.
This short perspective on the differentiation of data, and the need to determine dx/dU for electrochemical systems, such as battery electrodes, motivate the application of nonparametric regression to fit U(x) based on (1) a windowed cubic polynomial estimator (including the use of a variable window length of data) and (2) [20][21][22] We also discuss and formulate methods to determine how many points should be included in the cubic window regression (cf. Eqs. A7 and A10 and for the determination of L for a constant window width and L i for an adaptive window width, respectively), and the smoothing parameter p for the smoothing spline (cf. Eq. B23).

Numerical Methods
We provide the numerical details for the implementation of a windowed cubic polynomial estimator (including the use of a variable window length of data), and in Appendix A, we show how to apply a cubic smoothing spline in Appendix B. Common to all methods examined in the work is the use of cubic polynomials combined with a penalty on curvature in the smoothing functions in a manner that scales with the noise in the measurements. For the windowed cubic polynomial estimator with a fixed length of data points L, Equation A7 provides a relationship to determine the window length consistent with the noise in the data σ. Equation A10 provides the corresponding relationship for the variable window length estimator. For the smoothing spline, Equation B23 provides a relationship to determine the smoothing parameter p consistent with the noise in the data σ. We shall describe in the next section of this paper how we estimate σ (cf. Figure 3 and the related discussion).
We are not aware of a publication treating a cubic polynomial estimator with a variable window length of data, including the head, middle, and tail regions of the data (cf. Algorithm 2 in Appendix A). For the smoothing spline, the incorporation of proper scaling has not, to our knowledge, been reported previously (cf. Eqs. B9 and B10).

Results and Discussion
Cubic polynomial regression over a constant-width moving window for lithiated graphite.-The application of the cubic fit with a constant-width window (cf. Algorithm 1 of Appendix A) is shown in   Figure 2 show blue curves for the first and last of L = 239 data points. We exclude data below 80 mV and above 250 mV for calculating the sum of squared residuals in Eq. A6, as the slope in the actual U(x) data is significant below 80 mV and above 250 mV, so we don't penalize curvature for those potential ranges. For the graphite data, there were 14,132 total data points, with 12,692 within the range 80 mV ≤ U ≤ 250 mV (see the shaded potential regions that are excluded in the inset plot of Figure 3); hence, N = 12,692 in the use of Eqs. A6 and A7 to determine the window length using the standard deviation σ = 0.17 mV (to be discussed below).
Shown in Figure 4 are expanded views of the potential traces along two of the potential plateaus of Figure 2, which occur near 89 and 215 mV. From this plot (and many others not shown in this publication), we find that the measurement error in our equipment is about 0.15 mV for the graphite experiments. Hence, we expect σ to be near this value. We differentiate the cubic regression to approximate dx/dU, dx/dU ≈ 1/(a 1 + 2a 2 x + 3a 3 x 2 ), and by plotting U(x) versus dx/dU, we obtain the result shown in Figure 5. We find that window lengths less than 2L + 1 = 479 data points did not provide  enough smoothing, with the result that dx/dU at 89 mV became positive, which is not physically possible (batteries do not increase in potential energy upon discharge, and dx/dU must always be negative for lithiated graphite). The inset plot of Figure 5 for 2L + 1 = 477 shows that dx/dU transitions abrupty to 0 at 89 mV. (The maximum in the abscissa of Figure 5 is zero, and the positive values for dx/dU are not depicted as plotted.) When 2L + 1 is set to 479 (i.e., L is set to 239), we find from Eq. A6 that σ = 0.17 mV. From this observation, and that of σ being near 0.15 mV per Figure 4, we conclude that a value of σ = 0.17 mV is appropriate. The influence of changing σ, and thus L, is depicted in the inset plot of Figure 5. Larger values of σ and L give rise to more smoothing and loss of peak and valley amplitudes. For σ greater than about 0.25 mV, one can no longer see the small peak associated with the uppermost arrow. For σ greater than about 0.75 mV, the smoothing is excessive, and aberrations evolve, such as the positive value for dx/dU at about 89 mV, which, as noted previously, is not physically reasonable.
Smoothing spline regression for lithiated graphite.-For treating lithiated graphite with a smoothing spline, we determine the appropriate level of smoothing using the same settings as was done for the windowed cubic polynomial regression, that is, (i) we use σ = 0.17 mV for the noise in the voltage measurements and (ii) we penalize curvature only for data in the potential range 80 mV ≤ U ≤ 250 mV, leading to N = 12,692. For that range, the scaling factor s from Eq. B10 evaluates to 17.92 mV, which with x N −x 1 = 1 and σ = 0.17 mV, . This happens to be close to 1, so we simplify Eq. B12 to as N (x N − x 1 ) 3 σ 2 s 2 is sufficiently close to 1 for this work. When we specify the value of p in the figures to be discussed, Eq. 2 is employed for λ, and the smoothing spline coefficients are calculated by means of Eqs. B18-B22.
The general behavior of the smoothing spline is shown in Figure  6. Different values of the smoothing parameter p, corresponding to different values of σ (cf. Eq. B23), are provided in the legend of Figure 6. Because p = 1 − (p less than but tending to 1) corresponds to no smoothing, that extreme gives an interpolation spline that passes through each data point, i.e., y i = a i = U(x i ) vs. x i . The opposite extreme of p = 0 + gives the least-squares best-fit line l(x i ), which enters Eq. B10 for the calculation of s = 17.92 mV, the value mentioned above. Curves for values of p between 0 and 1 smoothly transition between the two limiting cases. Figure 6. Impact of the parameter p on the smoothing spline. For p = 1 − (i.e., p tending to but less than 1, corresponding to λ tending to 0), a cubic interpolation spline results, which passes through each data point. For p = 0 + (λ growing to infinity), a least-squares fit of a line through the data results. As shown in the legend, a value of σ can be associated with each p (cf. Eq. B23). The implication of the p−σ relationship is that lower standard deviations in the measurements are to be associated with higher p values, with p = 1 corresponding to no noise in the measurement.
The plots in Figure 7 for the smoothing spline analysis of lithiated graphite are analogous to those of Figure 2 for the windowed cubic polynomial. For the smoothing spline, there are no end regions, in contrast to the treatment of the first and last L data points for the windowed cubic, but the errors are highest near the end points, as is evident in Figure 7 and Figure 8. The inset plot of Figure 8 shows an expanded view for both ordinates. Evident are the dashed lines for σ = ± 0.17 mV and the shaded regions for potential outside the range of 80 mV ≤ U ≤ 250 mV.
A differential voltage spectroscopy plot using the fit smoothing spline coefficient 1/b i (see line 2 of Eq. B2) is provided in Figure 9. For the smoothing spline applied to the graphite data, we find that for σ values below 0.1 mV, positive values of dx/dU result near U = 89 mV; the results for σ = 0.1 mV and σ = 0.17 mV (the latter plotted in Figure 9) are difficult to distinguish, as are the results for  the windowed cubic and smoothing spline employing σ = 0.17 mV, both of which are displayed in the inset plot of Figure 9. As with the windowed cubic, for σ = 0.75 mV, dx/dU turns positive near U = 215 mV, indicating excessive smoothing (see the green curve in the inset plot of Figure 9).
In general, the results for the windowed cubic polynomial (   To avoid penalizing curvature where it is significant in the data, we do not penalize curvature for potentials below 3.6 V vs. Li. Figure 11 provides plots of U vs. x and U vs. dx/dU. The sensitivity of the U vs. dx/dU regression to the value of σ (and thus p) is shown in the left plot of Figure 12. The plot on the right shows the error in the regressed U vs. x curve relative to the data. The inset plot of Figure 12 shows an expanded view for both ordinates. Evident are the dashed lines for σ = ± 0.25 mV and the shaded region for potentials below 3.6 V, which are not included in the calculation of the squared error term of Eq. B23.
A larger value for the standard deviation in the data σ was employed for the Ni 0.6 Mn 0.2 Co 0.2 O 2 material (0.25 mV) vs. that for the graphite material (0.17 mV), consistent with our finding slightly larger variations in the NMC experiments, which can be seen by comparing Figure 4 with Figure 13. In both cases, the values of σ are small insofar as the 5V full-scale measurement range has an error of about 0.17/5000 or 0.0034% for the graphite measurements and 0.25/5000 or 0.005% for the NMC measurements.

Automated reaction recognition enabled by an adaptive
window.-To fulfill the objective associated with the schematic in Figure 1, that is, to identify when reactions are taking place, we recognize that for a cubic regression: The quantity 1/ p ≈ dx/dU is always negative. Hence, to deduce when a maximum or minimum occurs in 1/ p ≈ dx/dU , we can track p ≈ d 2 U/dx 2 . A reaction is identified by a positive-slope crossing of d 2 U/dx 2 through the line d 2 U/dx 2 = 0, corresponding to a minimum (a reaction peak) in the dx/dU trace. A negative-slope crossing corresponds to a maximum (valley) in the dx/dU trace.
While the constant-width window for the cubic regression and the smoothing spline provide similar and useful results for representing U(x) and dx/dU, the smoothing afforded by these two methods is insufficient for the lithiated graphite system for p ≈ d 2 U/dx 2 in the vicinity of the two largest peaks, as is seen in the left plot of Figure 14. (The results for the constant-width window for the cubic regression is effectively identical to those plotted in Figure 14.) The values for p ≈ d 2 U/dx 2 are sufficiently smooth to treat the data for the Ni 0.6 Mn 0.2 Co 0.2 O 2 host material.    Figure 11). The noise is roughly twice that for the graphite trace (cf. Figure 4).
The unwanted noise in Figure 14 can be removed by going to the adaptive window width, corresponding to Algorithm 2 of Appendix A. Shown in Figure 15 are results for the adaptive-width windowed smoothing with a cubic polynomial. (The adaptive window regression yields results that are indistinguishable from those of the constant window regression for U(x) and dx/dU as plotted in Figure  2 and Figure 5, respectively.) Larger window widths result about the larger potential plateaus, corresponding to larger reaction peaks, which gives more smoothing about the reaction peaks. Because the smoothing can be done locally with the adaptive window, the smaller value in the standard deviation in the data (0.15 mV, consistent with Figure 4) could be employed, allowing for more detail to be captured in the data without excessive noise in determining p ≈ d 2 U/dx 2 , as is made clear by comparing and contrasting Figure 16 with Figure 14. Notice that in contrast to the fixed-width windowing and the smoothing spline algorithms, the adaptive-width approach does not require any restriction on the voltage range over which smoothing is determined, as the global error measures of Eqs. A6 or B8, for the fixed-width algorithm and the smoothing spline, respectively, are replaced by the local error measure of Eq. A9. Numerical values for the reaction characteristics extracted from the data using Algorithm 2 of Appendix A, without any other user judgement, are provided in Table I. In summary, the use of the adaptive window as described in Algorithm 2,   Figure 4). The dashed green curve is a 6 th order polynomial fit of the entire data set to represent the low-frequency trend in the data.  Table I. Regressed values corresponding to the symbols (circles) in Figure 15 and Figure 16. while computationally more difficult, does allow us to provide an algorithm that satisfies the objective outlined in Figure 1; i.e., the autonomous identification of reactions by means of a nonparametric regression routine.

Summary
The focus of this work is enabling accurate DVS analyses, which requires an accurate method to differentiate numerically experimental data. Techniques employed for nonparametric regression and feature recognition are used to identify when electrochemical reactions take place in two commercially important intercalation electrode materials, lithiated graphite and nickel-manganese-cobalt oxide. The differentiation challenge is particularly significant for intercalation electrodes that undergo staging (similar to phase changes), as the peaks in plots of dx/dU vs. U are very sharp, as is seen for lithiated graphite (cf. Figure 5 and Figure 9). We analyze a moving window of data that are fit with a cubic polynomial (Appendix A) and a cubic smoothing spline (Appendix B). For the cubic polynomial, we investigate constant and adaptive window lengths. The three techniques are shown to work well for constructing DVS plots. For all three methods, we show how the standard deviation in the measurement method and the sum of the squared differences between the data and the regression fit can be used to estimate how large the moving window frame should be (Eqs. A6 and A9) and what the spline smoothing parameter should be (Eq. B23).
For the smoothing spline, we provide a rational means to nondimensionalize and scale the equation that is minimized (see Eqs. B9-B12); these matters appear to have been overlooked in the literature associated with smoothing splines. Allowing the window length for the cubic polynomial regression to adapt and vary yields the best results of all methods examined, as the adaptive window length enables more smoothing in the data where needed (e.g., around reaction peaks in dx/dU vs. U plots). The adaptive window method also removes the necessity for the user to restrict the range of data used in determining the smoothing.
An open issue that we do not address in this work is that of constraining the regression procedures to ensure that the fit function always has a negative slope, dU/dx < 0, as required by thermodynamic principles (i.e., the potential of a lithium-ion battery must always decline on discharge). Constrained regressions could be employed to enforce a negative slope, but the additional computational complexity and cost are significant. Mammen et al. 31 discuss this topic in the general context of polynomial regression, kernel smoothing, and smoothing splines. Turlach 32 discusses the topic in detail for smoothing splines. More research accommodating the methods discussed in Refs. 31 and 32 in the context of differential voltage spectroscopy may prove beneficial. In the work presented here, rather than imposing a constraint on the slope, we use any violation of that constraint as a signal that more smoothing is necessary.

Appendix A. Windowed Polynomial Smoothing
This Appendix describes methods for approximating a smooth function f (x) given noisy data points where i , i = 1, . . . , N , are random errors. The methods will use polynomial approximations to subsets of the points to smooth out the random errors. We will begin by reviewing how to compute a least squares polynomial approximation to a set of points, and then describe how this can be used iteratively on a sliding window of data points to approximate a long sequence of points with a succession of low-order polynomials. For integer d ≥ 1, let be a polynomial of degree d with coefficient array a = [ a 0 · · · a d ] T . Given any d + 1 points (x i , y i ), for i = 0, . . . , d, all x i distinct, there exists a unique polynomial p(x) that interpolates the points. Due to effects related to the Runge phenomenon, b even for noise-free data points, this interpolating polynomial may be a poor approximation to f (x), and especially as N grows, p(x) may oscillate wildly. The problem of oscillations is exacerbated by noise and further amplified if one evaluates derivatives of the polynomial. A remedy for this situation is to use a polynomial of lower degree to approximate the data points, and possibly to use different low-order polynomials on different subsets of the points. As one option, one might opt to partition the data into intervals and construct a piecewise polynomial approximation with continuity conditions enforced where the intervals meet; spline functions are constructions of this type. In this section, we consider instead an approach that, for each abscissa x i , approximates f (x) at x = x i using a polynomial fit to the points within a sliding window containing x i . For interior points, this window will be symmetrically centered on x i , but for points near either extreme of the range of the data, the window must necessarily become asymmetric.
To describe this method, we begin by considering how to fit a polynomial to n points. Later we will address the problem of selecting for each abscissa x i which subset of points to use in constructing an approximation there.
For n > d distinct points, we may find the unique least squares approximant, that is, the polynomial p(x; a) that minimizes the sum of squares One may consider the optimal p(x; a) to approximate the first d + 1 terms of the Taylor series for f (x) centered at x = 0. The differences p(x i ; a) − y i are called the residuals of the fit.
Computation of the least squares polynomial is most easily described in matrix notation. First, we gather the data into arrays x = [ x 1 · · · x n ] T and y = [ y 1 · · · y n ] T and let Equating the partial derivatives of S with respect to a to zero, we obtain the following optimality criterion, often called the normal equation for least squares: 25 For d < n with at least (d + 1) of the x i distinct, it can be shown that square matrix V T V is nonsingular, 23 hence there is a unique set of least squares coefficients: In more detail, writing the normal equation as  19 and especially Berrut and Trefethen 24 for discussions of the Runge phenomenon. The phenomenon can be suppressed by careful spacing of the samples near the ends of the interval, such as in Chebyshev interpolation, but we seek a method not dependent on such spacing. Indeed, experimental data is often produced at uniform intervals for which the Runge phenomenon is severe.
We note that for numerical robustness, it is also a good idea to scale and translate the x-coordinates so that the transformed coordinates range over 0 to 1 or −1 to 1.
Special case: uniformly spaced points.-Suppose that the x-coordinates of samples are uniformly spaced, that is, x i = ih for some constant increment h. Suppose further that we elect to fit a polynomial to n = 2L + 1 consecutive points, L integer. Then, if we translate coordinates to place the origin at the middle point and re-index the points as x −L , x −L+1 , . . . , x L−1 , x L , we have a symmetrical arrangement where x −k = −x k , k = 1, . . . , L. As a consequence, by Eq. A4, for i + j equal to any odd number, M i j = 0. To take advantage of this, we rearrange the ordering of the coefficients in a to place the even terms first and followed by the odd ones; for example, for d = 3 (cubic polynomial), we use a = [ a 0 a 2 a 1 a 3 ] T . When the columns of V are rearranged to match, the matrix M becomes block diagonal: Rather than writing general expressions for M 1 and M 2 , we illustrate for the specific case of cubic polynomials, for which M and the associated right-hand vector b become where each sum is taken over i = −L , . . . , L. The block diagonal structure allows one to solve for the odd and even coefficients independently, thereby reducing the original 4 × 4 matrix solve to two 2 × 2 solves. The generalization for larger d is straightforward. Moreover, one may simplify the entries of M using x k i = h k i k . Because we invoked a translation of the origin to x mid , in the original coordinates the polynomial becomes p(x − x mid ; a).
Moving window smoothing.-It often happens that a single low-order polynomial cannot adequately approximate a given data set. For example, a polynomial of degree d has at most d − 1 stationary points (maxima, minima, and inflections) and the function to be approximated could have more than this. One approach is to break the data set into pieces and fit a polynomial on each piece separately. Enforcing continuity of the piecewise polynomial and some number of derivatives yields a spline function. The smoothing spline discussed in the next section falls into this class.
Moving window smoothing takes a related, but different, approach. For each abscissa x i , we propose to compute a polynomial p k 1 :kn (x) that is the least squares fit to a window of neighboring data points (x k , y k ), k = k 1 , . . . , k n , choosing k 1 and k n to include i, i.e., k 1 ≤ i ≤ k N . Then, evaluating this polynomial at x i gives a value f i = p k 1 :k N (x i ) that we take as the approximation to f (x i ). We may similarly use the derivatives of p k 1 :k N (x) evaluated at x i to approximate the derivatives of f (x) at x i .
One may develop a variety of smoothing algorithms following this same basic plan. The choices that must be made are the degree d of the polynomials and the window to use around each point. We have found that cubic polynomials (d = 3) work well in our applications. In what follows, we propose two different prescriptions for choosing windows around the points. The first is simpler to implement, while the second shows clear advantage on some more challenging data sets.
Algorithm 1: constant-width windowed smoothing.-For this approach, we use an odd number, n = 2L + 1, of successive points for each approximation. Assume the given data points are numbered from 1 to N . For points in the middle of the data set, specifically, for indexes L < i < N − L, we use a window centered on x i to approximate there, while the ends of the data set must use unbalanced windows. Specifically, we use [A5] This leaves just the window half-width, L, to be specified. We will determine L so as to make residuals in the fit consistent with our knowledge of the accuracy of the process that produced the data. Suppose that the errors in the data points are independent, zeromean, with a standard deviation of σ, that is, the errors i in Eq. A1 are independent with expectations E( i ) = 0 and E( 2 i ) = σ 2 . For a given L, we obtain a sum of squared residuals c [A6] We expect SS R ≈ N σ 2 . (For more on squared residuals and nonparametric fitting, see chapter 2 of Eubank. 11 ) If SS R(N /2) ≤ N σ 2 , that is, if our polynomial fit to the entire data set falls within the expected residual, then we accept that polynomial as the smoothed fit. Otherwise, some smaller width is appropriate. For 2L ≤ d, the polynomials in Eq. A5 c In the statistics literature, instead of SSR, the term residual sum of squares and abbreviation RSS are sometimes used to mean the same thing.
interpolate the data, thus giving SS R(L ≤ d/2) = 0. As L increases from its minimum size to its maximum size, at some point SS R(L) must cross its target value of N σ 2 , and we aim to choose the largest L such that SS R(L − 1) ≤ N σ 2 as the window width for the smoothed approximation.
To determine the appropriate value of L, it is not necessary to increment through all its possible values. Instead, one may approach the problem using any available numerical root finder, such as bisection or the secant method, to solve SS R(L) = N σ 2 . Of course, the iterates for L must be rounded off to integers, and the root finder must terminate upon finding an L such that [A7] If SS R(N /2) > N σ 2 , we know that such an L must exist, although it may not be unique. Ideally, if there is more than one such width, we would use the largest one, but our experience is that any L satisfying this criterion is acceptable. It is not necessary to pinpoint L meeting the criterion in Eq. A7, as a nearby value will give almost the same fit. According to Reinsch's recommendation for the smoothing spline, 14,19 one could terminate the search upon finding a window width such that This criterion is justified on the assumption that the errors i are independent and Gaussian, in which case the sum of squared errors is a chi-square distribution that has a mean of N σ 2 and a standard deviation of √ 2N σ 2 . More precisely, since we are solving for (d + 1) coefficients in each polynomial fit, it may be more appropriate to seek a value of L such that SS R(L) = (N − d − 1)σ 2 , but in our work N d, so this minor adjustment would be of no substantial consequence.
Immediately below, we provide a pseudo-code summary of Algorithm 1. For such a function, the flat regions can be well approximated with a wider window than can be used where curvature changes quickly, so it is inappropriate to use a fixed window width. Letting the window width vary and denoting its value at point i as L i , we may modify Eq. A5 to use f i (L i ) instead of a fixed L. More precisely, we will determine an initial window of width L 1 that is used for all f i , i ≤ L 1 + 1, and a final window of width L N that is used for all f i , i ≥ N − L N . In between these, the L i must respect the limits of the indexes, L i < i and L i < N − i, and also the lower limit, L i ≥ d/2, below which the local polynomial is an exact interpolant of all points in the window. For the sake of robustness, it can be useful to set a somewhat larger minimum width, and in fact, we used cubic polynomials, d = 3, and set L i ≥ 6 in this article.
To determine the window widths, let us define a local sum of squared residuals for the window centered on point i, named SS Ri to distinguish it from SS R above: This formula is only defined for i > L and i ≤ N − L, otherwise the indexes in the fit extend beyond the ends of the data set. As we did in Eq. A5 for the fixed window width algorithm, we will have to use an asymmetric set of points to approximate function values near the beginning and end of the data set. As in the fixed-width case, we seek a window width compatible with the assumed noise covariance, that is, we seek a width L i such that SS Ri(i, L i ) = (2L i + 1)σ 2 for all i. Since L i is an integer, we cannot generally obtain equality, so instead we seek a window width that simultaneously satisfies [A10] We expect the window widths for neighboring points to be close to each other. Accordingly, after finding the initial width for the head of the data set, for every point thereafter, we use L i as the initial guess for the search for L i+1 . Since the window width only changes ) unless CC License in place (see abstract). ecsdl.org/site/terms_use address. Redistribution subject to ECS terms of use (see 207.241.231.82 Downloaded on 2019-04-27 to IP gradually, it is effective to simply increment L i if SS Ri(i, L i ) is too small and decrement it if SS Ri(i, L i − 1) is too big, subject to the width limits discussed above.
Caveat: Any least squares process can be skewed by outlier data points. For the adaptive window smoother, the sum of squares criterion for the window width is based on a local window of points instead of the whole data set, so outliers may have a bigger impact on the result in the region nearby than it will for the fixed-width smoother, although on the positive side, that impact will be more localized. If possible, one should remove outliers before computing the final smoothed output. Outliers were not observed in the data sets treated here.
Immediately below, we provide a pseudo-code summary of Algorithm 2. Comment: all data falls into the first window Else Comment: find initial half-width by bisection is the applicable polynomial according to Eq. A5. Higher derivatives may be approximated similarly. We note that all of these approximations are computed only at the sample abscissas. If one requires a function to approximate f (x) between sample points, some kind of interpolation function must be employed. One option is a spline, say s(x), that interpolates (x i , f i ), i = 1, . . . , N . It is reasonable to expect that s (x i ) will be close to p i (x i ), but in general, these will not be exactly equal. Alternatively, one could find the cubic on each interval that matches the smoothed function value and derivative value at both ends of the interval, pushing any disagreement to the second derivative. For the plots shown in this article, we linearly interpolate the function values between samples and do the same for derivatives.

Appendix B. Smoothing Spline
Smoothing splines are an alternative to windowed polynomial regression. In a manner analogous to the development of the cubic polynomial regression, we consider the cubic spline function S i (x), as described by de Boor, 19 for the data (x i , y i ), i = 1 to N : The fitting function S i (x) is analogous to p(x) in Eq. A2. In what follows, it will be useful to recognize that where primes denote differentiation with respect to x. Continuity relations at each node (or knot) yield where h i = x i+1 − x i . Using the Equations in B3, we can eliminate the coefficients d i and b i by means of B4 and B5, respectively: Using continuity of the first derivative (second line of B2) and Eqs. B4 and B5, we can write which can be rewritten in matrix notation as where R is an (N − 2) × (N − 2) tridiagonal symmetric matrix, c is a vector of length N − 2, Q T is an (N − 2) × N tridiagonal symmetric matrix, and a is a vector of length N .

Equation B7
can also be written as ⎡ Smoothing and appropriate scaling.-To penalize curvature, the function P, which is to be minimized, is commonly employed: 19,26 where σ i is the standard deviation in the data for the i th point associated with many measurements (from repeated experiments of the same kind), and 0 < p < 1. The integration of the square of the second derivative of the function f(x) represents the accumulation of the squares of all local curvatures f (x) from x 1 to x N . As p nears 1, no smoothing results, and we obtain an interpolating polynomial, wherein the regression passes through each point. As p nears 0, smoothing is complete, no local curvature is allowed, and we obtain a least-squares linear regression of the data. There is a problem with this well-used formulation, however: the first term on the right side of Eq. B8 is dimensionless, whereas the second term has dimensions of (u f ) 2 /(u x ) 3 , where u f are the units of f (x) and u x are the units of x. In this work, x is dimensionless, but to fix Eq. B8 so that is has a more general applicability, we replace it with where both terms on the right side are dimensionless, even if x and y have dimensions associated with them, and we suggest employing that is, s the square root of the sum of the squared deviation of the data y i (x i ) from a best-fit line l(x i ) for the data, which, being a line, has no curvature, thus providing a reference to scale f (x). The units of s are those of the data y i and the function f (x). One can argue as to whether s formulated in Eq. B10 is the right scaling for f (x), and we shall revisit this topic in the Results and Discussion section. Equation B9 has an additional benefit insofar as one can then employ an ascending or descending series in the independent variable, as the sign(x N − x 1 ) 3 = sign(dx), where dx is the differential increment in the integral of Eq. B9. (For the treatments in Refs. 19 and 26 the independent variable must be in ascending order. This is not a problem for problems having ascending time as the independent variable.) It is common to assume a constant standard deviation in the data, σ i = σ, an assumption we employ in this work, and this allows us to recast Eq. B9 with the lone dimensionless group λ: Solving the matrix equation for the smoothing spline coefficients.-In discretized form, d Eq. B11 can be written as = (y − a) T (y − a) + 2 3 λc T Rc

[B13]
Differentiating P λ with respect to a and setting the result equal to zero leads to Eliminating a from Eq. B14 by means of Eq. B7 leads to where Q T Q is an (N − 2) × (N − 2) symmetric, pentadiagonal matrix. For notational convenience, we seek a solution to In expanded form, for N = 7, A can be written as the matrix sum x i+1 ⎡ ⎢ ⎢ ⎢ ⎢ ⎣ η 1 y 1 − η 12 y 2 + η 2 y 3 η 2 y 2 − η 23 y 3 + η 3 y 4 η 3 y 3 − η 34 y 4 + η 4 y 5 η 4 y 4 − η 45 y 5 + η 5 y 6 η 5 y 5 − η 56 y 6 + η 6 y 7 More generally, the symmetric, pentadiagonal matrix A has elements and the vector β has elements Equation B16, Ac = β, can be solved by using efficient methods for pentadiagonal systems. From a storage perspective, A and β can be represented as No entries are shown in B21 for A i+1,i or A i+2,i , as A is symmetric (hence, A i,i+1 = A i+1,i and A i,i+2 = A i+2,i ). In this work, Ac = β (reflected by entries of B21) was solved using the algorithm described in Cheney and Kincaid. 27 Upon solving Eq. B16, one can use Eq. B14 to recover a: Having a and c, one can employ Eqs. B4 and B5 to determine the d i and b i coefficients, respectively.
Determination of the smoothing parameter p.-The same logic used to determine an appropriate value of L for polynomial regression can be applied to determine an appropriate value of p. Hence, an equation for the smoothing spline that is analogous to that of Eq. A9, with SS R = N σ 2 , is employed: This provides a (nonlinear) relation that can be used to determine an estimate for λ, and thus p, (cf. Eq. B12) once the standard deviation in the measurement σ is provided. Helpful expositions on Generalized Cross Validation devoted to smoothing splines can be found in Refs. 29 and 30; and the use of Eq. B23 represents a pragmatic method for addressing a complicated subject.