On Thursday Sept 15th from 9.00 till 10.30, there will be a Panel Discussion of interesting statistical issues relevant for Particle Physics, Astrophysics and Cosmology. The members of the Panel are Bob Cousins, David Cox, Jerry Friedman and Bernard Silverman. Questions submitted by participants are listed below.

The format will be that Panel members will be invited to comment on any specific question, and after discussion of that topic by other members of the Panel, members of the audience will be able to contribute to the discussion. Then we go to another Panel member, and so on till time is up.

Do the statisticians on the Panel consider there are any techniques we should be using, but currently we are not?

What are the properties that confidence intervals should respect? e.g. Frequentist coverage? Short but not too short? No empty intervals? Invariance with respect to reparametrisation? Robustness? Consistent approach to incorporating nuisance parameters?.......

Particle Physicists like to use frequentist approaches. Is it practically possible to use a Neyman construction in several dimensions i.e. of data and/or parameters? And if only one of the parameters is a physics parameter and the rest are nuisance parameters, so that it is necessary to project the multi-parameter confidence region in order to obtain that for the single physics parameter, what ordering rule will give optimal behaviour for the resulting one-dimensional intervals?

An experiment is looking for some rare or perhaps non-existent process. It involves simply counting events. There is an uninteresting background b which also contributes to the counting rate. The number of observed events N is expected to be Poisson distributed with mean b, which has been measured in a subsidiary experiment as b_0 +- sigma_b. (This can be thought of as being determined as c/r, where c is the number of events in a situation which is sensitive only to the background, and r is a scale factor which typically could be 5, and is accurately specified with zero uncertainty. Larger r results in a smaller error sigma_b). We want to calculate the p-value for the null hypothesis (only background), for observing at least N events. To be specific, we could take N = 9 and b = 3.1 +- 0.4.

What is the recommended statistical technique? It is desirable that it could easily be extended to a larger number of nuisance parameters.

If I understand correctly, eliminating a nuisance parameter by using the profile likelihood is equivalent to using a delta function prior for the nuisance parameter in the Bayesian philosophy. More refined likelihood methods can probably also be interpreted in the Bayesian way. If this is correct, why doesn't one use directly Bayesian methods?

In a Bayesian approach to parameter estimation, it is straightforward to include nuisance parameters in a Monte Carlo Markov Chain, then marginalize over them in order to recover high probability regions for the parameters of interest which include the effect of our imperfect knowledge of the nuisance parameters.

I am uncertain about the proper way to treat systematical errors in this context: an example could be the uncertainty associated with numerical inaccuracies of the code used, or the error induced by the fact that some second-order physical processes have been neglected in the code. This results in an uncertainty associated with the output of the code itself (which I'd classify as "systematical"), rather than a statistical uncertainty, associated with the data used. A common way to deal with this is to add the statistical and the estimated systematical errors in quadrature (or linearly if one wants to be conservative), then use this new artificial error on the data at hand.

I would like to know whether there are more satisfactory ways of dealing with systematical errors of the kind described above, and in particular methods which recognise the different nature of the statistical and systematical errors.

Setting the prior correctly by making use of all available information is a central problem of Bayesian model selection, where the result is strongly dependent on the prior scale and does not disappear with better data (as it is the case for parameter estimation).

I am getting interested in ways of setting the prior by maximum entropy arguments. I would like to hear the opinion of the Panel regarding this method, and in particular whether this way of determining priors is now well accepted in the (Bayesian) community. I would be interested in comments about the applicability and limitations of maximum entropy priors, if possible with examples illustrating situations where those kind of argument has been proven successful (or has failed for a clear reason).

To what extent are hypothesis testing and parameter determination equivalent. Are there simple examples to illustrate when they are equivalent and when they are not?

Analyzing HEP data, physicists more and more often deploy multivariate classification methods. We have seen a bunch of HEP publications where analysts separate signal and background by training a neural net on 10 or more input variables. Byron Roe and his associates in their recent work on PID at MiniBoone used 100 input variables for classification by boosted decision trees; this seems to set a record on the dimensionality used in HEP analysis. At the same time, there is a number of conservative physicists who refuse to adopt such multivariate methods with many input dimensions. They argue that it is very hard to assess how well Monte Carlo models data in so many dimensions, especially if one needs to take various systematic effects into account. Is there a generic prescription that relates the maximal reasonable dimensionality to the size of available Monte Carlo and data samples, in the context of a specific classification method? Can professional statisticians recommend good literature on variable selection, perhaps for two different problems - a) statistics- and b) systematics-dominated analysis? Should we attempt multidimensional analysis with dozens of input variables only when systematic effects do not matter much - for example, in rare signal searches - and stick to more robust and simple-minded techniques when systematics are important?

I have a data set where each entry is described buy 18 variables. I want to reduce the dimensionality from 18 to, say, 4. Are there well-understood techniques for doing this, so that some measure of information loss is minimised?

Is there a good method of using the Kolmogorov-Smirnov goodness of fit test with multi-dimensional data?

What do you do when you unblind your analysis and find stuff in there that wasn't predicted by either the background or signal estimates? This is usually the case where there's no data off source region to estimate the background and the background estimate is purely simulation based. You predict the background, open the box and find a big excess but then see that the excess lays in an event observable region that doesn't look like either signal or background predictions. You guess it's an unsimulated background, but then what? I would have thought that maybe you could possibly cut it away and recalculate but then I learned from Kath Rawlins that LIGO saw a similar thing in their analysis and she had showed that coverage got all screwed up if one removes events post-unblinding, even if one could tell for sure that the extra event was a background from some source you hadn't considered and that had you known about this class in the first place you would have designed cuts to never allow them into the final analysis.

If we have a complicated parameter-estimation technique, we may want to
use Monte Carlo simulation to check whether the procedure is behaving
sensibly. One way to do this is to look at the distribution of

pull = {(p_f - p_t)/sigma_f}^2

where p_f is the fitted
parameter, p_t is its true value, and sigma_f is the estimated error in
p_f. The distribution is calculated for repeated simulations.
Asymptotically and if all is well, we expect the distribution of pull to
be Gaussian centered on zero with unit width. However there are simple
non-asymptotic examples where this is not so. How do I know for small
sample simulations whether deviations from standard Gaussianity is cause
for worry or not?

Are there recognised methods for dealing with asymmetric errors? These often arise when estimating parameters in low statistics experiments? For example, we may measure a lifetime as 1.6 + 0.6 - 0.3 picoseconds. Then we might be interested in combining it with another measurement (e.g. taking the ratio of it with another lifetime); incorporating another contribution to the error, possibly also with asymmetric errors; or combining this result with another to obtain a weighted average.

I assume we have 11 variables per event. The reason this is of interest is the observation that Dzero obtained its most precise measurement of the top quark mass using the so-called matrix element method, a method that CDF is working on also. In the matrix element method one writes an explicit formula for the N-D differential density, based on one's knowledge of the matrix element squared and the mapping from partons to observed objects. The premise is that if one knew this N-d density one need look no further in terms of the search for new variables; one would just use the density directly. So the question is this: Given p(x) where x is N-dimensional and given q(y) where y is M-dimensional and y = f(x) (and perhpas M > N!), can the use of q(y) yield better signal/background discrimination than the use of p(x)? We spend a lot of time constructing y = f(x), by hand! Is this necessary, if we have p(x)?

Can there be or has their been progress to improve asymptotic techniques convergence in the tails? In particular the modified and adjusted profile likelihood techniques attempt to improve convergence of the first and second moments, but for a 5sigma test we are more interested in describing the tails.

Under what circumstances do the improved profile likelihood methods help? We have a range of N from 10-10,000 events, we have a range of likelihoods from Gaussian to highly-non Gaussian, and we are interested in a range significance levels from 2-5 sigma. Under the circumstances that they do help, how much do we stand to gain?

There exist other shrinkage estimators that could be helpful for high-dimensional problems like Supersymmetry. The estimators are biased, which will alarm most physicists and provide an obstacle for the methods to be accepted; however, these estimators can significantly improve the mean-squared error. Do you have any words of wisdom regarding these estimators: when are they good idea when are they a bad idea?