Quantifying uncertainty in (differential) rhythmicity • limorhyde2

Introduction

Here we show how to use limorhyde2 to quantify uncertainty in rhythmicity and differential rhythmicity. This step is not essential and can be computationally expensive, but can provide additional information. As limorhyde2 is a Bayesian approach focused on effect sizes rather than statistical significance, quantifying uncertainty relies on the concepts of posterior probability and credible intervals.

The data are based on liver samples from wild-type and Rev-erb\(\alpha/\beta\) double-knockout mice (Cho et al. 2012 and GSE34018).

Load packages

library('cowplot')
library('data.table')
library('ggplot2')
library('limorhyde2')
library('qs')

# doParallel::registerDoParallel() # register a parallel backend to minimize runtime
theme_set(theme_bw())

Load the data

The expression data are in a matrix with one row per gene and one column per sample. The metadata are in a table with one row per sample. To save time and space, the expression data include only a subset of genes.

y = GSE34018$y
y[1:5, 1:5]
#>       GSM840516 GSM840517 GSM840518 GSM840519 GSM840520
#> 12686 11.962830 11.923338 11.098814 10.958933  9.256413
#> 13170  8.989743  9.132606 12.381036 12.441759 14.766070
#> 26897 11.515292 11.625519 10.579969 10.601969 11.096489
#> 11287  7.985859  7.930935  7.674688  7.899531  7.768563
#> 12046  8.024084  7.856703  7.942198  8.172695  7.981340

metadata = GSE34018$metadata
metadata
#>        sample      cond time
#>  1: GSM840516 wild-type    0
#>  2: GSM840517 wild-type    0
#>  3: GSM840518 wild-type    4
#>  4: GSM840519 wild-type    4
#>  5: GSM840520 wild-type    8
#>  6: GSM840521 wild-type    8
#>  7: GSM840522 wild-type   12
#>  8: GSM840523 wild-type   12
#>  9: GSM840524 wild-type   16
#> 10: GSM840525 wild-type   16
#> 11: GSM840526 wild-type   20
#> 12: GSM840527 wild-type   20
#> 13: GSM840504  knockout    0
#> 14: GSM840505  knockout    0
#> 15: GSM840506  knockout    4
#> 16: GSM840507  knockout    4
#> 17: GSM840508  knockout    8
#> 18: GSM840509  knockout    8
#> 19: GSM840510  knockout   12
#> 20: GSM840511  knockout   12
#> 21: GSM840512  knockout   16
#> 22: GSM840513  knockout   16
#> 23: GSM840514  knockout   20
#> 24: GSM840515  knockout   20
#>        sample      cond time

Fit linear models and compute posterior fits

Because the samples were acquired at relatively low temporal resolution (every 4 h), we use three knots instead of the default four, which reduces the flexibility of the spline curves. We specify condColname so getModelFit() knows to fit a differential rhythmicity model.

fit = getModelFit(y, metadata, nKnots = 3L, condColname = 'cond')
fit = getPosteriorFit(fit)

Draw samples from the posterior fits

The posterior fits consist of not just a single set of model coefficients (the posterior means), but distributions of model coefficients. Sampling from these distributions is the first step to quantifying uncertainty in the fits. Here we generate 50 posterior samples, although an actual analysis would require more to accurately estimate the credible intervals.

fit = getPosteriorSamples(fit, nPosteriorSamples = 50L)

Get fitted time-courses

We can use the posterior samples to quantify uncertainty in the expected measurements, i.e., the fitted curves, by specifying the fitType argument. Here we focus on three genes.

genes = data.table(
  id = c('13170', '12686', '26897'),
  symbol = c('Dbp', 'Elovl3', 'Acot1'))
times = seq(0, 24, 0.5)

measFitSamps = getExpectedMeas(
  fit, times = times, fitType = 'posterior_samples', features = genes$id)
measFitSamps[genes, symbol := i.symbol, on = .(feature = id)]
print(measFitSamps, nrows = 10L)
#>        time      cond feature     value posterior_sample symbol
#>     1:    0 wild-type   13170  9.274814                1    Dbp
#>     2:    0 wild-type   12686 11.863308                1 Elovl3
#>     3:    0 wild-type   26897 11.566155                1  Acot1
#>     4:    0  knockout   13170 11.906920                1    Dbp
#>     5:    0  knockout   12686  9.816829                1 Elovl3
#>    ---                                                         
#> 14696:   24 wild-type   12686 11.824787               50 Elovl3
#> 14697:   24 wild-type   26897 11.662587               50  Acot1
#> 14698:   24  knockout   13170 11.843255               50    Dbp
#> 14699:   24  knockout   12686  9.893764               50 Elovl3
#> 14700:   24  knockout   26897  9.105940               50  Acot1

Given the expected measurements from the posterior samples, we can compute the lower and upper bounds of the credible interval for each combination of feature, condition, and time-point. By default, getExpectedMeasIntervals() calculates the 90% equal-tailed interval.

measFitInts = getExpectedMeasIntervals(measFitSamps)
print(measFitInts, nrows = 10L)
#>      time      cond feature symbol     lower     upper
#>   1:    0 wild-type   13170    Dbp  9.264877  9.538147
#>   2:    0 wild-type   12686 Elovl3 11.712048 11.940182
#>   3:    0 wild-type   26897  Acot1 11.433505 11.669591
#>   4:    0  knockout   13170    Dbp 11.808008 12.109973
#>   5:    0  knockout   12686 Elovl3  9.723500  9.904368
#>  ---                                                  
#> 290:   24 wild-type   12686 Elovl3 11.712048 11.940182
#> 291:   24 wild-type   26897  Acot1 11.433505 11.669591
#> 292:   24  knockout   13170    Dbp 11.808008 12.109973
#> 293:   24  knockout   12686 Elovl3  9.723500  9.904368
#> 294:   24  knockout   26897  Acot1  9.036087  9.230412

It’s always a good idea to also calculate the posterior mean fitted curves.

measFitMean = getExpectedMeas(fit, times = times, features = genes$id)
measFitMean[genes, symbol := i.symbol, on = .(feature = id)]

Now we can plot the results. In the first row, each curve corresponds to a posterior sample. In the second row, the ribbons indicate the credible intervals.

timeBreaks = seq(0, 24, 4)
pal = 'Dark2'

p1 = ggplot(measFitSamps) +
  facet_wrap(vars(symbol), scales = 'fixed', nrow = 1) +
  geom_line(aes(x = time, y = value, color = cond,
                group = interaction(cond, posterior_sample)), alpha = 0.2) +
  labs(x = 'Circadian time (h)', y = 'Expression (norm.)', color = 'Condition') +
  scale_x_continuous(breaks = timeBreaks) +
  scale_color_brewer(palette = pal) +
  theme(legend.position = 'none')

p2 = ggplot() +
  facet_wrap(vars(symbol), scales = 'fixed', nrow = 1) +
  geom_ribbon(aes(x = time, ymin = lower, ymax = upper, fill = cond),
              alpha = 0.3, data = measFitInts) +
  geom_line(aes(x = time, y = value, color = cond), data = measFitMean) +
  labs(x = 'Circadian time (h)', y = 'Expression (norm.)', color = 'Condition',
       fill = 'Condition') +
  scale_x_continuous(breaks = timeBreaks) +
  scale_fill_brewer(palette = pal) +
  scale_color_brewer(palette = pal) +
  theme(legend.position = 'bottom')

plot_grid(p1, p2, ncol = 1, rel_heights = c(1, 1.25))

Get rhythm and differential rhythm statistics

We can also use the posterior samples to quantify uncertainty in the statistics, again by specifying the fitType argument.

rhyStatsSamps = getRhythmStats(fit, features = genes$id, fitType = 'posterior_samples')
diffRhyStatsSamps = getDiffRhythmStats(fit, rhyStatsSamps)
diffRhyStatsSamps[genes, symbol := i.symbol, on = .(feature = id)]
print(diffRhyStatsSamps, nrows = 10L)
#>      feature posterior_sample     cond1    cond2 mean_mesor
#>   1:   12686                1 wild-type knockout   10.15766
#>   2:   12686                2 wild-type knockout   10.15766
#>   3:   12686                3 wild-type knockout   10.15766
#>   4:   12686                4 wild-type knockout   10.15766
#>   5:   12686                5 wild-type knockout   10.15766
#>  ---                                                       
#> 146:   26897               46 wild-type knockout   10.47346
#> 147:   26897               47 wild-type knockout   10.47346
#> 148:   26897               48 wild-type knockout   10.47346
#> 149:   26897               49 wild-type knockout   10.47346
#> 150:   26897               50 wild-type knockout   10.47346
#>      mean_peak_trough_amp diff_mesor diff_peak_trough_amp diff_peak_phase
#>   1:            1.7633818 -0.4215416           -2.6201893      -5.6887563
#>   2:            1.5702244 -0.4215416           -2.1313997      -6.1659571
#>   3:            1.7466211 -0.4215416           -2.5481985      -5.9954759
#>   4:            1.8630660 -0.4215416           -2.5796118      -5.6400564
#>   5:            1.7071846 -0.4215416           -2.4134397      -6.7763651
#>  ---                                                                     
#> 146:            1.1597560 -2.4048158           -1.5707881       0.2460203
#> 147:            0.9817103 -2.4048158           -0.8025853      -0.7843197
#> 148:            0.9179664 -2.4048158           -1.2098345      -1.1798541
#> 149:            0.8594796 -2.4048158           -0.9957860      -0.1180031
#> 150:            1.0290923 -2.4048158           -1.1370100      -2.2364984
#>      diff_trough_phase diff_rhy_dist symbol
#>   1:        0.82009394     3.0700062 Elovl3
#>   2:       -0.06900037     2.7052109 Elovl3
#>   3:        0.33848327     3.0569016 Elovl3
#>   4:       -0.14269700     3.1510468 Elovl3
#>   5:       -9.81410512     3.0545110 Elovl3
#>  ---                                       
#> 146:        6.78635282     1.5717491  Acot1
#> 147:       -2.94075970     0.8233279  Acot1
#> 148:       -3.24661986     1.2283422  Acot1
#> 149:        7.04463249     0.9960211  Acot1
#> 150:       -3.59999383     1.2401330  Acot1

In the plots below, each point represents a posterior sample.

p1 = ggplot(diffRhyStatsSamps) +
  facet_wrap(vars(symbol), nrow = 1) +
  geom_point(aes(x = diff_peak_trough_amp, y = diff_mesor), alpha = 0.2) +
  labs(x = bquote(Delta * 'amplitude (norm.)'), y = bquote(Delta * 'mesor (norm.)'))

p2 = ggplot(diffRhyStatsSamps) +
  facet_wrap(vars(symbol), nrow = 1) +
  geom_point(aes(x = diff_peak_trough_amp, y = diff_peak_phase), alpha = 0.2) +
  labs(x = bquote(Delta * 'amplitude (norm.)'), y = bquote(Delta * 'phase (h)'))

plot_grid(p1, p2, ncol = 1)

Finally, we can compute credible intervals for the rhythm and differential rhythm statistics. Again, by default these are 90% equal-tailed intervals. Currently getStatsIntervals() does not calculate intervals for phase-based statistics, since phase and phase difference are circular quantities.

rhyStatsInts = getStatsIntervals(rhyStatsSamps)
print(rhyStatsInts, nrows = 10L)
#>          cond feature       statistic      lower      upper
#>  1: wild-type   13170      peak_value 14.9173520 15.2200466
#>  2: wild-type   12686      peak_value 11.7151501 11.9406772
#>  3: wild-type   26897      peak_value 12.2457363 12.5654355
#>  4:  knockout   13170      peak_value 14.1594493 14.4428868
#>  5:  knockout   12686      peak_value 10.1737541 10.4395112
#> ---                                                        
#> 14: wild-type   12686 peak_trough_amp  2.7919175  3.1746157
#> 15: wild-type   26897 peak_trough_amp  1.3339754  2.0103134
#> 16:  knockout   13170 peak_trough_amp  2.3729890  2.8259867
#> 17:  knockout   12686 peak_trough_amp  0.4218476  0.7843328
#> 18:  knockout   26897 peak_trough_amp  0.3025411  0.6903593

diffRhyStatsInts = getStatsIntervals(diffRhyStatsSamps)
print(diffRhyStatsInts, nrows = 10L)
#>        cond1    cond2 feature            statistic      lower      upper
#> 1: wild-type knockout   12686           diff_mesor -0.4215416 -0.4215416
#> 2: wild-type knockout   13170           diff_mesor  1.1950262  1.1950262
#> 3: wild-type knockout   26897           diff_mesor -2.4048158 -2.4048158
#> 4: wild-type knockout   12686 diff_peak_trough_amp -2.6906488 -2.1714818
#> 5: wild-type knockout   13170 diff_peak_trough_amp -3.7882613 -2.9900856
#> 6: wild-type knockout   26897 diff_peak_trough_amp -1.5602198 -0.8128997