# The Real Ones

On the last post, we looked at recovering a periodic signal from a radial velocity plot and interpreting it as a planet. Now let’s look at a few of the complications involved in this.

A powerful statistical tool used to get an idea of what kind of periodic signals are in your radial velocity data set is a Lomb–Scargle periodogram (the mathematical details for the interested reader may be found here, but it is sufficiently complex to warrant skipping over in the interests of maintaining reader attention and reasonable post length). In the interests of brevity, further references to the Lomb-Scargle periodogram will be shortened to simply “periodogram.”

The purpose of this periodogram is to give an indication of how likely an arbitrary periodicity is in a data set whose data points need not be equally spaced (as is frequently the case in astronomy for a variety of reasons). Periodicities that are strongly represented in the data are assigned a higher “power,” where periodicities that are not present or only weakly present are given a lower power.

Let’s look at an example using a radial velocity data set for BD-08 2823 (source) If we calculate a periodogram for the data set, we come up with this

BD-08 2823 RV Data Periodogram

The dashed line represents a 0.1% false alarm probability (FAP). A clear, obvious peak is seen at 1 day, 230 days and ~700 days, implying that periodicities of 1 day, 230 days, and ~700 days are present in the data. Creating a one-planet model with a Saturn-mass planet at 238 days produces a nice fit. After subtracting this signal from the data, we’re left with the residuals. Now we may run up a periodogram of the residuals and see what’s left in the data.

BD-08 2823 Periodogram of Residuals

We see three noteworthy things. First and foremost is the emergence of a new peak in the periodogram that was not strongly present before at 5.6 days. We also see that the peak at 1 day remains. Lastly we see that the peak toward 700 days has weakened and moved further out. It would seem to suggest the 700-day signal is perhaps not real, or was an artifact of the 238-day signal.

Why was the 5.6-day signal not present in the first periodogram? The answer may lie in it’s mass: the planet has a mere 14 Earth-masses. It’s RV signal is completely dominated by the Saturn-mass planet. The giant planet forces the shape of the RV diagram and the signal of the second planet is just dragged along, superimposed on the larger signal.

On the radial velocity data plot, the two-planet fit we have come to looks like this:

BD-08 2823 Two-Planet Fit

It is important to realise that the obvious sine curve is not necessarily a bold line, but there is a second periodicity in there going up and down frantically, once every 5.6 days, compared to the Saturn-mass planet, at 237 days.

The fit has a reduced chi-squared of χ2 = 3.2, and a scatter of σO-C = 4.3 m s-1. There’s no obvious structure to the residuals and the scatter is not terribly bad, so any new signals will likely indicate planets of low mass. Let’s check in on the periodogram of the residuals to the two-planet fit and see what may be left in the data.

Periodogram of Residuals to 2-Planet Fit

That signal out toward a thousand days is stubbornly refusing to go away, despite a low χ2. It may either not be real, or it may be indicative of a low-amplitude signal with a rather long period.

Also noteworthy is that the periodicity at one day continues to exist, rather strongly. This periodicity is what’s known as an alias. Because the telescope observes only at night, the observations are roughly evenly spaced – there are (on average) twelve hour gaps between each data point. Therefore a sine curve with a period of 24 hours can be made to fit the data. To illustrate this, consider this (completely made up) data set:

Fake Signal

There’s no doubt that the data is well-fitted by the sine curve, but there is no real evidence that the periodicity proposed by it arises from a real, physical origin. What’s more, a sine curve with half this period could also equally well fit the data. So could a sine curve with a third of this period, and so on. There are mathematically an infinite number of aliases at ever-shortening periods that can be fit to this data.

Generally, if you observe a system with a frequency of $f_o$, and there exists a true signal with a frequency of $f_t$, then aliases will exist at frequencies $f_{t+i} * f_o$, where i is an integer.

Therefore we see that these aliases are caused by the sampling rate. If we could get data between the data points already available, if we could double our observation frequency, we could break this degeneracy. But the problem for telescopes on Earth is that the star is not actually up in the sky more than half the day, and a given portion of the time it is up could be during daylight hours. Therefore the radial velocity data sets of most stars can be plagued with short-period aliases since there is typically a small window of a few hours to observe any given star. It must be noted that as the seasons change and the stars are in different places in the sky at night, that window of availability will shift around a bit, allowing one some leverage in breaking these degeneracies. Ultimately, telescopes in multiple locations around the world (or one in space) would sufficiently break these degeneracies.

A real example of aliases exist in this example from an Alpha Arietis data set. In this case, the alias is not nearly so straightforward. Two signals of periods 0.445 days and 0.571 days can be modelled to fit the data.

Alpha Arietis RV Alias

So which of these two signals correspond to an actual planet? It turns out neither of them do: these radial velocity variations are caused by pulsations on the star – contracting and expansion of the star produces Keplerian-like signals in radial velocity data, too. That’s yet another thing to watch out for. This can be detected with simultaneous photometry of the star. If there is a photometric periodicity that is equivalent to your radial velocity periodicity, avoid claiming a planet at this period as if your academic credibility depends on it.

Additional observations could easily break this degeneracy, provided they are planned at times where the two signals do not overlap.

We see therefore that it is important to keep in mind that a low FAP speaks only to whether or not the signal is real, and not where or what it actually came from. The one-day periodicity is surely present in the data, but it is not of physical origin. It can also be extremely hard to tell whether or not a signal at a given period is actually an alias of another, more real period. There are times when the peak of an alias in the periodogram can be higher than the actual, real period. For reasons that include these, radial velocity fits must be considered fairly preliminary. New data may provide drastic revisions to the orbital periods of proposed planets if signals are exposed to be aliases.

Confusion over aliases have occurred before in literature. HD 156668 b and 55 Cnc e have both had their orbital periods considerably revised after it was realised that their published periods were, in fact, aliases. In the case of 55 Cnc e, the new, de-aliased orbital period ended up being vindicated after transits were detected). The GJ 581 data set, for example, is severely limited by sampling aliases that have spawned controversies over the possible existence of additional planets in that system.

In summary, periodograms are a useful tool to provide the user with a starting point when fitting Keplerian signals to radial velocity data, but they cannot distinguish real signals from aliases. Many observations with a diverse sampling rate are necessary to disentangle aliases from true planetary signals. Ultimately, a cautious approach to fitting signals to radial velocity data works best.