What is cross-correlation?
Cross-correlation measures the similarity between two time series as a function of the time lag applied to one of them. A high cross-correlation at lag k means that Series X at time t is strongly related to Series Y at time t+k. It is widely used in signal processing, econometrics, and neuroscience to detect lead-lag relationships. See also our calculate Canonical Correlation.
What is the difference between cross-correlation and autocorrelation?
Autocorrelation is the correlation of a single time series with a lagged version of itself, measuring how past values predict future values in the same series. Cross-correlation extends this concept to two different series, measuring how one series relates to a lagged version of another.
How is the cross-correlation coefficient calculated?
At lag k, the cross-correlation is computed as the sum of the products of mean-centered values of X(t) and Y(t+k) divided by a normalization factor. When using the coefficient normalization, the result is divided by the square root of the product of the variances of both series, bounding the output between -1 and +1.
What does a negative cross-correlation mean?
A negative cross-correlation at a particular lag means the two series tend to move in opposite directions at that offset. For example, if X is high, Y tends to be low k periods later. Values closer to -1 indicate a stronger inverse relationship.
What normalization method should I use?
Use 'Coefficient' normalization if you want results bounded between -1 and +1 for easy comparison, similar to a Pearson correlation. 'Biased' (divide by N) is common in signal processing and produces a stable estimate. 'Unbiased' (divide by N - |lag|) compensates for fewer data points at large lags but can have higher variance.
How many lags should I compute?
A common rule of thumb is to compute lags up to about N/4, where N is the length of your series. Computing too many lags relative to series length reduces statistical reliability because fewer data point pairs are available at large lags.
Do both series need to be the same length?
Yes. For standard cross-correlation, both Series X and Series Y must contain the same number of observations. If they differ in length, you would need to truncate or pad one series before analysis.
What does the lag of maximum correlation tell me?
The lag at which the CCF reaches its maximum value indicates the time offset at which the two series are most closely related. If the lag is positive, Series X leads Series Y by that many periods; a lag of zero means the series are most correlated simultaneously.