The Periodogram

When dealing with radial velocity data sets, the behaviour of the star is pretty unambiguous for a convenient sampling rate. For example, if a planet orbits a period over the course of a year, and the star is observed weekly, then there’s a clear sinusoidal curve in the data that can be easily picked out and modeled. However the sampling rate is often not convenient, and can even be larger than the period of the planet. Some data sets can essentially be great guessing games, where the periods are in the data but take some considerable effort to find. This effort takes the form of tuning the orbit parameters of the Keplerian fit until it finally lines up with enough data points.

For example, consider this radial velocity data set for 54 Piscium given by Fischer et al.

Radial velocity data set for 54 Piscium

It turns out there is a tool we can use to get a feel for what signals might be in the data. This tool, called a periodogram, was discussed earlier (here). Assume that we have a data set of velocities (or any variable, really). The radial velocity, v, is measured at a time t, resulting in a time series data with N data points, {v (tj), j=1,2,…,N0 }. A Fourier transform is performed on the data set as follows.

\displaystyle \begin{aligned} P_v(\omega)&=\frac{1}{N}|\text{FT}_v(\omega)|^2\\ &=\frac{1}{N_0}\left|\sum_{j=1}^{N_0}v(t_j)\exp(-\text{i}\omega t_j)\right|^2\\ &=\frac{1}{N_0}\left[\left(\sum_jv_j\cos(\omega t_j)\right)^2+\left(\sum_jv_j\sin(\omega t_j)\right)^2\right]\\ \end{aligned}

where P is the “power” of a given period ω, N is the number of data points, v is the radial velocity at time tj, and i is the imaginary unit.

With the above equation we can compute a basic periodogram for this data set. It is useful because if the data set contains a sinusoidal component with a frequency of ω0, then v (t) and exp(-iωt) are in phase. This causes a sinusoidal signal near ω to significantly contribute to the sums in the equation. Otherwise, values of ω that are far from a real sinusoidal signal in the data randomly fluctuate between positive and negative and therefore more or less cancel out to a small sum. Hence, if a sinusoidal signal of period ω exists in the data set, the value of Pv (ω) will be large. In a graph of periods on the x-axis, the presence of that period in the data set can be graphed as a “power” on the y-axis. Higher power values indicate that the periodicity is more strongly represented in the data.

With the periodogram, while we can technically look at the power of any of an infinite number of frequencies, it would be helpful to narrow our search to a finite set of frequencies. We must decide what frequencies those are. Since the periodogram of the data is symmetric (that is, P (-ω) = P (ω)), all of its information is contained in positive frequencies, n=0,1,2,…,N (=N0/2). Roughly half of the information in the data is discarded when calculating the periodogram because the absolute value part of the equation eliminates any distinction between positive and negative frequencies. As such, we need only to look at

$\displaystyle N=N_0/2$

evenly spaced frequencies.

It is not necessary to understand how the periodogram works, but what it means is important for this discussion. The periodogram takes your data set and tells you what periodicities are in it. For example, a periodogram of the above 54 Piscium data set would look like the following.

54 Piscium RV periodogram

A strong peak in the periodogram is found at ~62 days, corresponding to the location of 54 Piscium b. We can fold the data to the period of the planet and see a reasonable fit by a ~Saturn-mass planet in a 0.284 AU orbit with an eccentricity of 0.63, and a longitude of periapsis of 235 degrees.

54 Psc b fit

The periodogram, while useful, suffers a couple of problems. One, which is probably fairly obvious, is that it is noisy. A plethora of small peaks are seen that don’t really apply to anything physically real. The periodogram is noisy even when the actual data itself is only slightly noisy. Furthermore, the noise does not diminish with increasing sample size. This may be a bit counter-intuitive, as any fake signals would seem to be discouraged by adding new data, but it turns out that adding new data also adds new frequencies on top of the old, discouraged ones. As such, the noise is not actually averaged out by adding more data. However while the noise does not decrease with increasing sample-size, the signals in the data do. Specifically, the signal-to-noise ratio of a real signal in the data increases with the square root of the total number of observations.

Another major problem and contribution to noise is the problem of leakage. Signals tend to “leak” into other periodicities. In the 54 Psc periodogram above, you can see smaller peaks – called “sidelobes” – straddling the real signal of the planet. This is caused by other similar signals being sufficiently in phase that they end up having a higher P (ω) value. Such leakage to nearby periods is caused by the finite total interval over which the data is sampled.

Sidelobes are fairly easy to recognise and ignore, but a more dangerous sort of leakage occurs due to the finite size of the interval between measurements, a phenomena called aliasing. In particular, periods that are of integer ratios of the planet’s signal can be represented in the data. For a real signal of length S, aliased signals can exist at S/2, S/3, and so on. When you see a car driving down the road and the tires look like they are actually spinning very slowly, perhaps even in the opposite direction, the sampling rate of your eyes and brain are being fooled by an alias of the real signal – the actual rotation of the tires. Variable sampling rates are usually sufficient to eliminate the aliases that are most likely to be problematic.

As you can see, there are helpful statistical tools to take data sets and extract information out of them, but their interpretation is often unclear or ambiguous and great care must be taken when using them. As to the overreaching point that has been hinted at numerous times on this blog, detecting extrasolar planets is hard.

A Thousand Planets

Depending on where you get your information from and how much weight you lend it, we have reached a thousand known planets.

Some of the semi-official sites like exoplanet.eu and more official sites like NASA’s Exoplanet Archive show less than this number. In the case of the latter because it appears they only accept planets that have made it past peer review, which is a reasonable, if not high, standard. In the case of exoplanet.eu, while it has been a valuable asset since 1995, it has missed a few planets here and there as time has gone on (especially during a recent overhaul of the site). There’s a number of other anomalies there, but it’s a site run by a guy in his spare time so there’s a limit to how much you can expect of it. That being said, it’s still a very valuable resource.

There exists a fairly small group of people, myself shamelessly included, who keep tabs on extrasolar planet news and developments nearly religiously. The count varies from person to person, but I am not alone in asserting that there are now 1,000 known planets. By my count, we’ve passed that a couple months ago, but I’ve decided to give it more time to help cover some margin for error in the planet count.

Where does this margin of error arise? There’s a number of planets whose disposition is not very clear. They have been proposed and later disputed, but not fully disproven. There are planets that are unconfirmed, but confident enough that they can be talked about as real planets. And lastly there are Kepler candidates that have been determined to be planets, but in some cases have not even been included in a preprint on arXiv yet. As such, it is not possible for me or anyone to point to a specific planet and say “this is the thousandth known planet.”

In the big picture, humanity’s first thousand planets is only the top layer of H2O molecules of the iceburg of the planet population in the Galaxy. It is severely plagued by biases in favour of short-period and/or high-mass planets due to the nature of our detection methods and completeness of our detection surveys. We have found many hot Jupiters, but we know full well that this is a minority (less than 1% of stars have a hot Jupiter). It’s clear that small planets are more prevalent, it’s just a matter of detecting them.

Recently, it was announced that the nearby M dwarf GJ 667C hosts three super-Earths in its habitable zone. Taken together with the two habitable planet candidates at Kepler-64 and single habitable planet candidates in other systems, we have about a dozen targets for a search for life. Some of these planets are better candidates than others, and I won’t encourage any undue optimism by refraining from being outright by saying that some of them appear pretty unlikely candidates – a few of them look like we’re scraping the bottom of the barrel in desperate hope (I’m looking at you, HD 40307 g, GJ 163 c, Kepler-22 b, GJ 581 d).

Still, the fact that our first thousand planets contains at least a few planets where it’s not impossible for life to exist there is encouraging, especially when considering how biased our detection methods are against them. Combined with Kepler data that tells us that habitable planets are ubiquitous in the Galaxy, I am actually quite optimistic about the odds for there being a second biosphere in the solar neighbourhood.

We have learned so much in the first thousand planets, detected at a slow rate at first, but growing to over a hundred per year. It has taken us 20 years to detect the first thousand exoplanets. I would not be surprised if the next thousand come in only five years and feature many more habitable planet candidates.

Lastly, I have been dealing with some events in my “personal life” that have kept me busy, and so I have had less time to focous on extrasolar planet science and writing about it here. This is partly why this post doesn’t have a lot of meat to it. I look forward to writing more enlightening posts in the near future.