When dealing with radial velocity data sets, the behaviour of the star is pretty unambiguous for a convenient sampling rate. For example, if a planet orbits a period over the course of a year, and the star is observed weekly, then there’s a clear sinusoidal curve in the data that can be easily picked out and modeled. However the sampling rate is often not convenient, and can even be larger than the period of the planet. Some data sets can essentially be great guessing games, where the periods are in the data but take some considerable effort to find. This effort takes the form of tuning the orbit parameters of the Keplerian fit until it finally lines up with enough data points.

For example, consider this radial velocity data set for 54 Piscium given by Fischer et al.

It turns out there is a tool we can use to get a feel for what signals might be in the data. This tool, called a periodogram, was discussed earlier (here). Assume that we have a data set of velocities (or any variable, really). The radial velocity, *v*, is measured at a time *t*, resulting in a time series data with *N* data points, {*v* (*t*_{j}), *j*=1,2,…,*N _{0}* }. A Fourier transform is performed on the data set as follows.

where *P* is the “power” of a given period ω, *N* is the number of data points, *v* is the radial velocity at time *t _{j}*, and i is the imaginary unit.

With the above equation we can compute a basic periodogram for this data set. It is useful because if the data set contains a sinusoidal component with a frequency of ω_{0}, then *v* (*t*) and exp(-i*ωt*) are in phase. This causes a sinusoidal signal near ω to significantly contribute to the sums in the equation. Otherwise, values of ω that are far from a real sinusoidal signal in the data randomly fluctuate between positive and negative and therefore more or less cancel out to a small sum. Hence, if a sinusoidal signal of period ω exists in the data set, the value of *P _{v}* (

*ω*) will be large. In a graph of periods on the

*x*-axis, the presence of that period in the data set can be graphed as a “power” on the

*y*-axis. Higher power values indicate that the periodicity is more strongly represented in the data.

With the periodogram, while we can technically look at the power of any of an infinite number of frequencies, it would be helpful to narrow our search to a finite set of frequencies. We must decide what frequencies those are. Since the periodogram of the data is symmetric (that is, *P* (-ω) = *P* (ω)), all of its information is contained in positive frequencies, *n*=0,1,2,…,*N* (=*N*_{0}/2). Roughly half of the information in the data is discarded when calculating the periodogram because the absolute value part of the equation eliminates any distinction between positive and negative frequencies. As such, we need only to look at

evenly spaced frequencies.

It is not necessary to understand how the periodogram works, but what it means is important for this discussion. The periodogram takes your data set and tells you what periodicities are in it. For example, a periodogram of the above 54 Piscium data set would look like the following.

A strong peak in the periodogram is found at ~62 days, corresponding to the location of 54 Piscium b. We can fold the data to the period of the planet and see a reasonable fit by a ~Saturn-mass planet in a 0.284 AU orbit with an eccentricity of 0.63, and a longitude of periapsis of 235 degrees.

The periodogram, while useful, suffers a couple of problems. One, which is probably fairly obvious, is that it is noisy. A plethora of small peaks are seen that don’t really apply to anything physically real. The periodogram is noisy even when the actual data itself is only slightly noisy. Furthermore, the noise does not diminish with increasing sample size. This may be a bit counter-intuitive, as any fake signals would seem to be discouraged by adding new data, but it turns out that adding new data also adds new frequencies on top of the old, discouraged ones. As such, the noise is not actually averaged out by adding more data. However while the noise does not decrease with increasing sample-size, the signals in the data do. Specifically, the signal-to-noise ratio of a real signal in the data increases with the square root of the total number of observations.

Another major problem and contribution to noise is the problem of leakage. Signals tend to “leak” into other periodicities. In the 54 Psc periodogram above, you can see smaller peaks – called “sidelobes” – straddling the real signal of the planet. This is caused by other similar signals being sufficiently in phase that they end up having a higher *P* (ω) value. Such leakage to nearby periods is caused by the finite total interval over which the data is sampled.

Sidelobes are fairly easy to recognise and ignore, but a more dangerous sort of leakage occurs due to the finite size of the interval between measurements, a phenomena called aliasing. In particular, periods that are of integer ratios of the planet’s signal can be represented in the data. For a real signal of length *S*, aliased signals can exist at *S*/2, *S*/3, and so on. When you see a car driving down the road and the tires look like they are actually spinning very slowly, perhaps even in the opposite direction, the sampling rate of your eyes and brain are being fooled by an alias of the real signal – the actual rotation of the tires. Variable sampling rates are usually sufficient to eliminate the aliases that are most likely to be problematic.

As you can see, there are helpful statistical tools to take data sets and extract information out of them, but their interpretation is often unclear or ambiguous and great care must be taken when using them. As to the overreaching point that has been hinted at numerous times on this blog, detecting extrasolar planets is *hard*.