## 2012 Review

An Earth-mass planet orbiting Alpha Centauri B. Credit:ESO

2012 brought us yet another remarkable year of extrasolar planet science. While the planet catch for 2012 was a little less than last year’s, the quality and importance of planets revealed this year was amazing. By far the most major results have been the discovery of an ~Earth-mass planetary companion orbiting the secondary component of the nearest star system to our own, Alpha Centauri (see here), and evidence for a system of planets around the nearby star Tau Ceti (see here). I hesitate to draw conclusions from a small amount of data, but the discovery of a terrestrial planet at none other than our nearest neighbour seems to really emphasize the point that terrestrial planets are likely as common as dirt.

A nice system of planets was reported at Gliese 676A consisting of super-Earths and Jovian planets, HATnet and SuperWASP produced more hot Jupiters, and interestingly, a couple sub-Earths may have been found around the nearby star Gliese 436. Spitzer provided us with the first detection of thermal radiation from a super-Earth (see here). A pair of M giants also became the first known to have planets, with planets reported around HD 208527 and HD 220074.

Circumbinary planets were announced around RR Cae, NSVS 14256825, Kepler-34 and Kepler-35 and Kepler-38, which is notable as the first Neptune-sized circumbinary planet.

Kepler results picked up en masse this year. At first it started out nice and slow, with small groups of planets being announced in batches (See here, here, here and here), followed by dozens and dozens of planets.

Interesting Kepler results included Kepler-64, the first quadruple-star system with a planet. The planet is a circumbinary planet, no less. But easily the most important circumbinary planet find was Kepler-47, the first transiting multi-planet circumbinary system. Multi-planet circumbinary systems have been found before but this is the first to have multiple planets transiting. This allows not only for their existence to be much more certain (non-transiting circumbinary planets still suffer from the mass-inclination degeneracy), but allows us to test for coplanarity. The Kepler-47 system demonstrates conclusively that short-period binary stars can host full systems of planets. Another pair of planets with very close orbits to each other, yet very dissimilar densities were reported at Kepler-36. The orbits of the planets in the Kepler-30 system were shown to be well-aligned with their host star’s equator, showing us that systems of planets are, like ours, often neatly arranged and not chaotically scattered.

Good news and bad news about the Kepler spacecraft. The good news is that the mission is extended for another three years. The bad news is that unfortunately, a reaction wheel on the Kepler spacecraft failed, and the mission’s continued usefulness now rests on all of the other reaction wheels remaining operational.

Kepler also unveiled a system of three sub-Earth planets huddled around a dim red dwarf, Kepler-42, which is very similar to Barnard’s Star, as well as a possible small terrestrial planet being evaporated away due to the heat from its star (see here). One of these three planets is Mars-sized(!).

We gained more evidence that the Galaxy is just drowning in planets both from continued Kepler results, HARPS results, and from gravitational microlensing data. Kepler showed us that hot Jupiter systems are frequently lacking in additional planets.

Last but not least, habitable planet candidates were reported around Gliese 163 and HD 40307, with unconfirmed habitable planet candidates reported at Tau Ceti and Gliese 667 C – with two more planets possibly occupying the star’s habitable zone. If GJ 667 Ce is confirmed, then it would be the most promising habitable zone candidate to date, based on its low mass.

At the end of 2011, I gave some wild guesses as to how the extrasolar planet landscape would look like at the end of 2012. Here we are and how have those predictions held up?

The Extrasolar Planets Encyclopaedia lists 854 planets as of the time of this writing, however it is missing quite a few. My own count has us at 899 planets.

• The discovery of a ring system around a transiting planet

There are hints of ring systems (or perhaps rather circumplanetary disk systems) around Fomalhaut b, β Pictoris b, and 1SWASP J140747.93-394542.6 b (see here) but none of these are confirmed. So I’m calling it a missed prediction.

• More low-mass planets in the habitable zone from both radial velocity and transit

Two new habitable planet candidates from radial velocity, none from transit.

• Confirmation of obvious extrasolar planet atmospheric variability (cloud rotations, etc).

I was counting on continued monitoring of the HR 8799 planets to search for atmospheric variability, but it simply didn’t happen (or rather, if it did happen, the results are still pending). So I’m calling this a miss.

2013 could be a very interesting year, especially for Kepler. It seems we are on the verge of finding a true Earth analogue. The detection rate of candidate habitable planets is picking up and we’re really starting to get a list of targets to follow-up in the next decade. Here’s some more brave guesses for the end of 2013:

• 1200 Confirmed planets and planet candidates
• A satellite of an extrasolar planet (an “exomoon”)
• A confirmed ring system around an extrasolar planet
• Phase curve mapping of a sub-Jovian planet

## The Real Ones

On the last post, we looked at recovering a periodic signal from a radial velocity plot and interpreting it as a planet. Now let’s look at a few of the complications involved in this.

A powerful statistical tool used to get an idea of what kind of periodic signals are in your radial velocity data set is a Lomb–Scargle periodogram (the mathematical details for the interested reader may be found here, but it is sufficiently complex to warrant skipping over in the interests of maintaining reader attention and reasonable post length). In the interests of brevity, further references to the Lomb-Scargle periodogram will be shortened to simply “periodogram.”

The purpose of this periodogram is to give an indication of how likely an arbitrary periodicity is in a data set whose data points need not be equally spaced (as is frequently the case in astronomy for a variety of reasons). Periodicities that are strongly represented in the data are assigned a higher “power,” where periodicities that are not present or only weakly present are given a lower power.

Let’s look at an example using a radial velocity data set for BD-08 2823 (source) If we calculate a periodogram for the data set, we come up with this

BD-08 2823 RV Data Periodogram

The dashed line represents a 0.1% false alarm probability (FAP). A clear, obvious peak is seen at 1 day, 230 days and ~700 days, implying that periodicities of 1 day, 230 days, and ~700 days are present in the data. Creating a one-planet model with a Saturn-mass planet at 238 days produces a nice fit. After subtracting this signal from the data, we’re left with the residuals. Now we may run up a periodogram of the residuals and see what’s left in the data.

BD-08 2823 Periodogram of Residuals

We see three noteworthy things. First and foremost is the emergence of a new peak in the periodogram that was not strongly present before at 5.6 days. We also see that the peak at 1 day remains. Lastly we see that the peak toward 700 days has weakened and moved further out. It would seem to suggest the 700-day signal is perhaps not real, or was an artifact of the 238-day signal.

Why was the 5.6-day signal not present in the first periodogram? The answer may lie in it’s mass: the planet has a mere 14 Earth-masses. It’s RV signal is completely dominated by the Saturn-mass planet. The giant planet forces the shape of the RV diagram and the signal of the second planet is just dragged along, superimposed on the larger signal.

On the radial velocity data plot, the two-planet fit we have come to looks like this:

BD-08 2823 Two-Planet Fit

It is important to realise that the obvious sine curve is not necessarily a bold line, but there is a second periodicity in there going up and down frantically, once every 5.6 days, compared to the Saturn-mass planet, at 237 days.

The fit has a reduced chi-squared of χ2 = 3.2, and a scatter of σO-C = 4.3 m s-1. There’s no obvious structure to the residuals and the scatter is not terribly bad, so any new signals will likely indicate planets of low mass. Let’s check in on the periodogram of the residuals to the two-planet fit and see what may be left in the data.

Periodogram of Residuals to 2-Planet Fit

That signal out toward a thousand days is stubbornly refusing to go away, despite a low χ2. It may either not be real, or it may be indicative of a low-amplitude signal with a rather long period.

Also noteworthy is that the periodicity at one day continues to exist, rather strongly. This periodicity is what’s known as an alias. Because the telescope observes only at night, the observations are roughly evenly spaced – there are (on average) twelve hour gaps between each data point. Therefore a sine curve with a period of 24 hours can be made to fit the data. To illustrate this, consider this (completely made up) data set:

Fake Signal

There’s no doubt that the data is well-fitted by the sine curve, but there is no real evidence that the periodicity proposed by it arises from a real, physical origin. What’s more, a sine curve with half this period could also equally well fit the data. So could a sine curve with a third of this period, and so on. There are mathematically an infinite number of aliases at ever-shortening periods that can be fit to this data.

Generally, if you observe a system with a frequency of $f_o$, and there exists a true signal with a frequency of $f_t$, then aliases will exist at frequencies $f_{t+i} * f_o$, where i is an integer.

Therefore we see that these aliases are caused by the sampling rate. If we could get data between the data points already available, if we could double our observation frequency, we could break this degeneracy. But the problem for telescopes on Earth is that the star is not actually up in the sky more than half the day, and a given portion of the time it is up could be during daylight hours. Therefore the radial velocity data sets of most stars can be plagued with short-period aliases since there is typically a small window of a few hours to observe any given star. It must be noted that as the seasons change and the stars are in different places in the sky at night, that window of availability will shift around a bit, allowing one some leverage in breaking these degeneracies. Ultimately, telescopes in multiple locations around the world (or one in space) would sufficiently break these degeneracies.

A real example of aliases exist in this example from an Alpha Arietis data set. In this case, the alias is not nearly so straightforward. Two signals of periods 0.445 days and 0.571 days can be modelled to fit the data.

Alpha Arietis RV Alias

So which of these two signals correspond to an actual planet? It turns out neither of them do: these radial velocity variations are caused by pulsations on the star – contracting and expansion of the star produces Keplerian-like signals in radial velocity data, too. That’s yet another thing to watch out for. This can be detected with simultaneous photometry of the star. If there is a photometric periodicity that is equivalent to your radial velocity periodicity, avoid claiming a planet at this period as if your academic credibility depends on it.

Additional observations could easily break this degeneracy, provided they are planned at times where the two signals do not overlap.

We see therefore that it is important to keep in mind that a low FAP speaks only to whether or not the signal is real, and not where or what it actually came from. The one-day periodicity is surely present in the data, but it is not of physical origin. It can also be extremely hard to tell whether or not a signal at a given period is actually an alias of another, more real period. There are times when the peak of an alias in the periodogram can be higher than the actual, real period. For reasons that include these, radial velocity fits must be considered fairly preliminary. New data may provide drastic revisions to the orbital periods of proposed planets if signals are exposed to be aliases.

Confusion over aliases have occurred before in literature. HD 156668 b and 55 Cnc e have both had their orbital periods considerably revised after it was realised that their published periods were, in fact, aliases. In the case of 55 Cnc e, the new, de-aliased orbital period ended up being vindicated after transits were detected). The GJ 581 data set, for example, is severely limited by sampling aliases that have spawned controversies over the possible existence of additional planets in that system.

In summary, periodograms are a useful tool to provide the user with a starting point when fitting Keplerian signals to radial velocity data, but they cannot distinguish real signals from aliases. Many observations with a diverse sampling rate are necessary to disentangle aliases from true planetary signals. Ultimately, a cautious approach to fitting signals to radial velocity data works best.

## Making Waves

Photo by Mila Zinkova, 2007

We looked at how the Doppler effect can tell you about the radial velocity of a star (here), and how you can use this information to detect an orbiting companion (here). Let’s now look a bit more at some of the later parts of this process. You’ve already finished your observations, you’re done with the telescope time and you have a collection of data points. What do you do with them?

The first thing you will want to do is to create a model that fits the data. This is done using the equations in the post referred to in the above paragraph. The model will simply manifest itself as a plot of the radial velocity behaviour you would expect for a planet with a given mass and orbit. The closer your model fits the data, the more likely your model is a good representation of reality. For example, let’s look at radial velocity data of the giant star Iota Draconis (Credit: ESO).

Iota Draconis (HR 810) Radial Velocity Data

The individual points are the actual data, whereas the sine wave is the model of a 2.26 MJ planet with a 320-day period. We see that the proposed model fits the data well (being very close to the data), and is thus likely representative of reality. There are a few stray points and the model is not a perfect match to the data, but the data points can be prone to intrinsic noise, such as flares or granulation on the star’s surface, or instrumental noise with the spectrometer.

We see that the radial velocity data set covers several orbits of the planet around the star. As the planet continues to orbit the star (as it will do so for aeons) and the radial velocity data continues to come in (as we would hope would occur, more data is more opportunities to discover something new), the graph could become extremely long and complicated. For example, consider a set of radial velocity data for HD 192263 (source).

Unfolded HD 192263 RV data

What we could do would be to fold the entire data set by a certain period of time, specifically the orbital period (taking the time of the measurement modulo the orbital period). Since the model is periodic, that is, repeating each orbital period, the data points (provided they do have the same periodicity that you’re folding by) will fold up into a nice, neat, singular sine curve.

HD 192263 Folded RV data

Here it is quite convincing that the model well fits the data. But how do we quantify how good of a fit to the data our model is? A value exists called $\chi^2$(read “chi squared”).

$\displaystyle \chi^2 = \sum \frac{(O-C)^2}{\sigma^2}$

That is, calculate the difference between each observed data point and the model (or “observed minus calculated”), divided by $\sigma^2$, which represents the variance of the data. Then total up each of these. The higher the $\chi^2$, the poorer the model fits the data. The variance, $\sigma^2$,can be determined with

$\displaystyle \sigma^2 = \frac{1}{N} \sum_{i=1}^{N}(v_i - \bar{v})^2$

Which is simply an average of the differences of each point, $v$, from the average values of the points $\bar{v}$. To put this in other terms (and this is more of a discussion about general statistics than it is specific to radial velocity data), all of the radial velocity measurements can be averaged together. The average of the differences of each point from that average radial velocity value is the variance of the data, $\sigma^2$, and is a representation of how scattered the data is. Our $\chi^2$value is simply the total of each point’s difference from our model, while dividing it by the variance to compensate for highly scattered data. Models that are not rigidly affixed to the data points (high total OC) are given more forgiveness of the data itself has a lot of variance ($\sigma^2$) anyway. If there is low variance, then only models that are closer to the datapoints (low OC) are treated well, or given a low $\chi^2$ value.

We can obtain another value called the “reduced chi-squared” which takes into account the number of freely adjustable variables (or, the “degrees of freedom”). This is obtained simply by dividing $\chi^2$by the degrees of freedom, $d$.

$\displaystyle \chi^2_{red} = \frac{1}{Nd} \sum \frac{(O-C)^2}{\sigma^2} = \frac{\chi^2}{d}$

The need to do this stems from the fact that a quantitative analysis of data must be undertaken to seriously distinguish between two models whose goodness of fit to the data may not be as easily discriminated between in our “chi-by-eye” technique of just looking at them.

Large reduced chi-squared values indicate a poor model fit, and a value of 1 is typically sought but can be extremely difficult or impossible to achieve due to instrumental or intrinsic noise that are out of one’s ability to control. Values of the reduced chi squared that are less than 1 are typically a sign that you are “over-fitting” the model to the data. This can result from trying too hard to fit the model to noise, or overestimating the variance.

HD192263 RV Residuals

Above are the residuals to the HD 192263 dataset. This is keeping the x-axis (the time of the measurement) and plotting the O-C on the y-axis, instead of the radial velocity value for that data point. In this example, there is a trend in the data that has been modelled out which could potentially be due to a second orbiting companion that is far enough away that only a very small section of its orbit has been observed.

Other than the trend, there is no apparent structure in the residuals, and so there is no reason to suspect the existence of additional orbiting bodies.

One can calculate the standard deviation, or variance, of the residuals to get a measure of how scattered the residuals are. This is typically represented with $\sigma$ or written as ‘rms’ in literature, and is an algebraically obvious transformation of one of the above equations,

$\displaystyle \sigma = \sqrt{\frac{1}{N} \sum_{i=1}^{N}(v_i - \bar{v})^2}$

The radial velocity jitter of a star scales with both its mass and chromospheric activity. A quiet M dwarf may have a jitter as low as 2 to 3 metres per second, whereas a sun-like star may have a jitter on the order of 3 – 5 metres per second. F-type stars have jitters close to 10 metres per second, and the jitter sky-rockets from there, where A-type stars are nearly unusable for using radial velocity to detect extrasolar planets. As of the time of this writing, no planets have ever been discovered around an A-type star through the use of the radial velocity method. So there is an amount of variance in the residuals we should expect to have given the star we’re studying. If the variance is much higher than expected for the star, then it could be evidence of either unusually high stellar chromospheric activity, or as-yet undetected planets with small amplitude signals.

The systems we have looked at thus far throughout this blog in the context of radial velocity have been systems whose radial velocity data graphs are dominated by a single periodicity. Indeed, our own solar system is like this, with Jupiter dominating the sun’s radial velocity profile. But not all planetary systems come so conveniently organised.

I leave you with an image of how daunting this can be and why we are all thankful that computers are here to help us: An image of the 2008 three-planet fit for HD 40307 (since discovered to have three more planets).

HD 40307 Three-Planet Fit

Update (15 Apr 2013): I have corrected a math error in the equation for variance and standard deviation. In both cases the $(v_i-\bar{v})$ term should have been squared.