(University of Essex)
Dynamic panel-data modelling with structural nested mean models
Panel data are vitally important for studying time-varying social processes and the effect that one process, like employment, has on another process, such as health. While the temporal ordering of longitudinal outcomes often allow us to rule out the problem of reverse causation, the problem of unobserved confounding must be addressed. Standard approaches like multilevel (random effects) models or marginal (“GEE”) models are only valid if there is no unobserved confounding and other constraints on the underlying process can be shown to hold (e.g. Robins et al 1999).
Dynamic panel-data models are a widely used in economics for the estimation of causal effects from panel data in the presence of unobserved confounding. These are semi-parametric conditional models in which ‘lag effects’ of previous outcomes are hypothesised to have causal effects on the current outcome. An efficient estimation framework based on the generalized method of moments (GMM) was developed 20 years ago in a series of pioneering papers (e.g. Arellano and Bond 1991). These estimators depend primarily on linearity of the conditional model and Markov assumptions limiting the influence of prior outcomes on the current outcome.
In this talk, I consider an alternative from the biostatistics literature called structural nested mean models (SNMMs). SNMMs are semi-parametric marginal models for repeated potential outcomes that are parameterised directly in terms of causal effects. Linear and log-linear SNMMs, and semi-parametrically efficient ‘g-estimation’ for both, were developed by Robins (1994). In the presence of lag effects, I first focus on a comparison between linear SNMMs and dynamic panel-data models; and then on the limitations of nonlinear SNMMs in the presence of lag effects. The comparison is illustrated in an application of these models to mental health and employment status using data from the British Household Panel Survey.
Arellano, M. And Bond, S.R. (1991) Some tests of specification for panel data. Review of Economic Studies, 58, 277-297
Robins, J.M. (1994) Correcting for non-compliance in randomized trials using structural nested mean models. Communications in Statistics, 23, 2379-2412
Robins, J.M., Greeland, S. and Hu, F.C. (1999) Estimation of the causal effect of a time-varying exposure of the marginal mean of a repeated binary outcome (with discussion). Journal of the American Statistical Association, 94, 687-700
(CentERdata / Tilburg University)
Innovation in online longitudinal data collection for scientific research
There are compelling reasons to expect that Internet interviewing will become the dominant survey mode in the social sciences in the next 10–20 years, largely replacing written, face-to-face, and telephone interviewing. Internet penetration is increasing fast all over the world and among all socioeconomic groups. The technological developments not only make Internet interviewing cost effective but also create opportunities for innovatively asking questions or collecting data in other ways than through survey questions.
Based on earlier experiences with an online scientific panel, an advanced data collection environment for the social sciences was started in the Netherlands in 2007: “An Advanced Multi-Disciplinary Facility for Measurement and Experimentation in the Social Sciences" (MESS). MESS is an innovative data collection facility intended to boost and integrate research in various disciplines—such as economics, social sciences, life sciences and behavioural sciences—in the Netherlands and abroad. The central element of the facility, the Longitudinal Internet Studies for the Social sciences (LISS) panel, is a representative panel of about 5,000 households who answer monthly interviews over the Internet. The LISS panel is based on a probability sample drawn from population registers. Households that could not otherwise participate are given a computer and broadband access. Other key elements of the MESS project are: (1) a longitudinal core questionnaire and experimental modules proposed by researchers from all over the world; (2) synchronization of content with major other social and economic data collection efforts; (3) innovative forms of data collection (such as weighing scales linked to the Internet, accelerometers, and smartphones); (4) linking with administrative data; and (5) surveys of special groups that are often underrepresented in socioeconomic surveys. The facility is embedded in a strong global network of researchers, with close cooperation with the American Life Panel (ALP), a similar facility in the US, and contacts with similar initiatives in several other European countries.
The presentation will give an introduction of (the set-up of) the LISS panel (including issues such as representativity and attrition), a brief overview of the other key elements of the MESS project, and the opportunities for opening up new research questions in longitudinal data analysis.
Bianca De Stavola
(London School of Hygiene and Tropical Medicine)
Mediation and life course epidemiology: Challenges and examples
In life course research we often wish to disentangle the processes that link early life factors to certain distal outcomes, i.e. to study how the effect of a particular exposure on the outcome is mediated by intermediate factors.
The study of mediation has a long tradition in the social and behavioural sciences and a relatively more recent one in epidemiology. The first is linked to path analysis and structural equation models (SEMs), while the second mostly to methods developed within the potential outcomes approach to causal inference. In both strands there is broad agreement that any causal interpretation of estimated mediating effects should be extremely cautious because of the strength of the assumptions required for causality to be established. Such caution is particularly relevant for life course epidemiology, where complex webs of exposures and mediators are at play. However it is the investigation of causal pathways that is at the core of life course research.
This talk will present the definitions used by the two schools, acknowledge the contribution of the SEM literature, but also stress the greater rigour and generality permitted by the potential outcomes framework. Two examples taken from perinatal epidemiology and childhood psychiatry will illustrate issues, assumptions, and estimation methods.
(University of Bristol)
Modelling repeated measures growth data by aligning significant growth events and modelling changes in within-individual variability over time
In many applications of multilevel models the assumption of a constant level 1 variance may be unrealistic. Following earlier work, the talk discusses how we can elaborate a function for the variance structure at the lowest level of a data hierarchy that allows for random coefficients across higher level units. The paper gives an illustrative example of the methodology applied to an existing random effects growth curve model that includes random effects for the timing of adolescent growth, giving a more flexible structure including a more realistic modelling of the level 1 variance. An MCMC algorithm will be described with an application to longitudinal data on growth in height. It is also argued that there may be substantive interest in the form of the lowest level variance function other than in providing a better specified model for the other parameters.
(Katholieke Universiteit Leuven)
A flexible modeling framework for over-dispersed, hierarchical data of a joint nature
Non-Gaussian outcomes are often modeled using members of the so-called exponential family. Notorious members are the Bernoulli model for binary data, leading to logistic regression, and the Poisson model for count data, leading to Poisson regression. Two of the main reasons for extending this family are (1) the occurrence of overdispersion, meaning that the variability in the data is not adequately described by the models, which often exhibit a prescribed mean-variance link, and (2) the accommodation of hierarchical structure in the data, stemming from clustering in the data which, in turn, may result from repeatedly measuring the outcome, for various members of the same family, etc. The first issue is dealt with through a variety of overdispersion models, such as, for example, the beta-binomial model for grouped binary data and the negative-binomial model for counts. Clustering is often accommodated through the inclusion of random subject-specific effects. Though not always, one conventionally assumes such random effects to be normally distributed. While both of these phenomena may occur simultaneously, models combining them are uncommon. This paper proposes a broad class of generalized linear models accommodating overdispersion and clustering through two separate sets of random effects. We place particular emphasis on so-called conjugate random effects at the level of the mean for the first aspect and normal random effects embedded within the linear predictor for the second aspect, even though our family is more general. The binary, count, and time-to-event cases are given particular emphasis. These can be modeled separately as well as jointly. Connections between joint modeling and a variety of areas are emphasized: group sequential trials; clusters with informative size; incomplete data.
Collaborators: Geert Verbeke (K.U.Leuven, Belgium), Michael G. Kenward (London School of Hygiene and Tropical Medicine, U.K.), Clarice G.B. Demétrio (ESALQ, Piracicaba, SP, Brazil), Edmund Njeru Njagi (UHasselt, Belgium)
Molenberghs, G., Verbeke, G., and Demétrio, C. (2007) An extended random-effects approach to modeling repeated, overdispersed count data. Lifetime Data Analysis, 13, 513—531.
Molenberghs, G., Verbeke, G., Demétrio, C.G.B., and Vieira, A. (2010) A family of generalized linear models for repeated measures with normal and conjugate random effects. Statistical Science, 25, 325-347.
(Norwegian Institute of Public Health)
Protective estimation of panel models when data are not missing at random
We discuss estimation of multilevel or mixed models for longitudinal data when missing responses are not missing at random. A typology of missingness mechanisms is considered that includes missingness dependent on current outcomes (whether actually observed or missing), lagged outcomes, and subject-specific effects. When responses are not missing at random, valid estimation by random-effects approaches generally requires correct modeling of the missingness mechanism, which hinges on unverifiable assumptions. We show that fixed-effects approaches can be protective in the sense that they are valid for monotone or intermittent missing data under a range of missingness mechanisms. This approach neither relies on correct modelling of the missingness mechanism nor on refreshment samples and is straightforward to implement in standard software.
Collaborator: Sophia Rabe-Hesketh (University of California, Berkeley)
Skrondal, A. and Rabe-Hesketh, S. (2014) Protective estimation of mixed-effects logistic regression when data are not missing at random. Biometrika, 101, 175-188
(University of Oxford, University of Groningen)
Email (Oxford) Email (Groningen)
Longitudinal methods for using panel data of networks and behaviour to assess peer influence
Assessing peer influence is difficult because of what Manski (1993) called the “reflection problem”: it is difficult to decide whether peers are alike because they selected each other based on similarity of attributes and behaviour (social selection), or because they influenced each other (social influence). Examples where such questions occur are studies of adolescent development, with behaviours such as smoking, drinking, and school attitudes. As a first step, the researcher must decide whom to consider as peers. A social network approach suggests that those may be considered as peers who regard each other as subjectively relevant interaction partners, and to study entire peer networks in groups with a natural network boundary such as school cohorts. Panel data of the relational network and relevant behavioural variables in such groups can be helpful to obtain evidence for social selection and for social influence in such groups. To analyse and model such data, the dependence structures that characterize social network data are a major challenge.
This presentation treats stochastic actor-oriented models, a class of continuous-time Markov chain models that allow representing the interdependent dynamics of networks and individual attributes for data collected in a panel design. Tie and attribute changes here are modelled as the results of “choices” made by the nodes in the network, governed by multinomial logistic regression models, where a large number of such changes will be made unobserved between successive panel observations. This allows much flexibility in the representation of dependencies between ties as well as dependence on covariates. The models can be characterized as generalized linear models with a lot of missing data. They are implemented in the R package RSiena. An overview of stochastic actor-oriented models and estimation procedures will be given, illustrated by examples.