Research

NOTE:  A lot of work by me and others in our Department is done in collaboration with students and faculty who are part of the Bioinformatics and Genomics ("Bx") program here, and you might find that program of interest:   http://www.huck.psu.edu/education/bioinformatics-and-genomics

====> Notice software available for simulating evolutionary genetics and genetic epidemiology (see ForSim and simQTL sections, below)

My research interests are easy to see by checking our blog, named The Mermaid's Tale after our recent book. Comments and discussions take place there that you may find interesting (we hope so, at least!).

NOTE: While I am on graduate student doctoral committees and teach graduate students, I am not taking new students into my group.

My primary interest is in the evolution of complex traits, evolutionary principles generally, and issues in the nature of our knowledge in the life sciences, including bioethical and societal aspects. I'm also interested in human variation, how it got here, how much there is, and what it means in regard to complex traits. My work is largely in genetics and human polymorphisms and the amount of variation in genes related to human phenotypes, including disease-related traits, but with a concentration on the evolutionary processes that generate the genetic architecture of complex traits, and issues involved in finding that variation from genetic data. I take bioinformatic and simulation approaches to this subject. Programming and data analysis skills are important, and there are and will be great job opportunities in this area for the foreseeable future.

A major thrust of my work is in developmental genetics and the evolution of developmental processes that control complex traits.  We are currently working on the genetic basis of morphological traits that have been important in vertebrate evolution and constitute the fossil record, with particular attention on craniofacial and dental morphology and its evolution.  Our current model systems include collaborative work with a baboon genealogy in San Antonio (Texas Biomedical Research Institute), and various experimental mouse models. We have extensive collaborations with Drs Richtsmeier, Jablonski, Ryan, and Buchanan in our Department, with Jeff Rogers at Baylor University in Houston, and Jim Cheverud at Washington University and various other collaborators.

Tying these areas together is an interest in an over-arching view of evolution as we know it from the science itself, as well as the way the post-Darwin world has adopted Darwinian concepts. These concepts connect genes, history, origins, and the process of change as well as biological development. But the same general ideas of competitive natural selection are extended to many other areas of science and philosophy, including physics, the assumed nature of extra-terrestrial life, and human societies generally (see Bioethics, below). Often this is done uncritically or even based on misunderstandings about what we know about life and evolution. That can lead to misunderstandings both within biology itself and in the general public. Darwinian ideas have also often been upsetting to some religions, as we know, and we're in a time when that sensitivity is high in this country.

I collaborate with Joe Terwilliger and Joe Lee, both at Columbia University in New York. Joe T also has developed many tools for genetic inference in the disease context. His slightly wacky (linkage.cpmc.columbia.edu/index2.html) shows why that is interesting to do! We co-teach a mini-course called Logical Reasoning in Human Genetics in which we discuss the amount and evolutionary origin of human variation in the context of human disease, and approaches to find and characterize that variation. The next scheduled offering is likely in late 2013 or early 2014 but has not yet been set; contact me or J Terwilliger for details.

A major collaboration with the two Joesis a broad-purpose, forward evolutionary simulation program package, developed for us by Brian Lambert, to examine how evolution works and the issues we face in inferring the underlying genetic architecture of complex traits. In real life, we can never be sure we know the whole truth, but you can be when you have simulated data. Or one may wish to hypothesize a particular genetic situation that would be obtained in a sample designed to infer genetic causation (as, say, of a disease) and want to have some help in choosing what type and size of study would have adequate power to detect the causation you think is occurring. The program is called ForSim. More details are given below.

You can generally find our published papers by searching on PubMed or this lab web page.

::::Please note that while I try to keep this webpage reasonably current, I can't guarantee its precise accuracy. Things change as grants, interests, and personnel come and go. I do my best to keep it updated, so for questions about my research, please contact me: kenweiss@psu.edu

**New Mouse Under Development**

Human variation and disease

Craniofacial development and evolution

ForSim and simQTL: Forward Evolutionary Simulation and Genetic Epidemiological Study Design simulation

Bioethics and Biocosmology

Ethics Links

 


 

Development and patterning of the mammalian dentition

A major project here is led by Kazuhiko Kawasaki in my lab. See the Publications tab on my web page for publication references, of which there are several.  Gene duplication creates evolutionary novelties by using older tools in new ways. We have identified evidence that the genes for enamel matrix proteins (EMPs), milk caseins, and salivary proteins comprise a family called the SCPP genes (secretory calciumphosphoproteins) that are descended from a common ancestral gene called SPARCL1 by tandem gene duplication. These genes remain linked, except for one EMP gene, amelogenin. The SCPP genes show common structural features and are expressed in ontogenetically similar tissues. Many of these genes encode secretory Ca-binding phosphoproteins, which regulate the Ca-phosphate concentration of the extracellular environment. By exploiting this fundamental property, these genes have subsequently diversified to serve specialized adaptive functions. Casein makes milk supersaturated with Ca-phosphate, which was critical to the successive mammalian divergence. The innovation of enamel led to mineralized feeding apparatus, which enabled active predation of early vertebrates. The EMP genes comprise a subfamily not identified previously. A set of genes for dentine and bone extracellular matrix proteins constitutes an additional cluster distal to the EMP gene cluster, with similar structural features to EMP genes. The duplication and diversification of the primordial genes for enamel/dentine/bone extracellular matrix may have been important in core vertebrate feeding adaptations, the mineralized skeleton, the evolution of saliva, and, eventually, lactation. The order of duplication events may help delineate early events in mineralized skeletal formation, which is a major characteristic of vertebrates. We are also exploring the similar evolutionary history of other gene families involved in bioimineralization, in particular genes related to collagen.

The SCPP family of genes provides and example of phenogenetic drift in which a trait (the mineralized nature of teeth, for example) is retained by natural selection, while its genetic basis changes. We are currently concentrating on early vertebrate evolution, including sharks, agnaths(lamprey), and amphibians. My Publications page lists other SCPP papers by Kazz and myself --search also in PubMed under Kawasaki K as author for papers he has authored on his own on this interesting subject..

Craniofacial development and its evolution

In collaboration with Joan Richtsmeier, Nina Jablonski, Tim Ryan, Anne Buchanan, and students and post-doctoral fellows in our Department, Jim Cheverud in Washington University, and Jeffrey Rogers at Baylor College of Medicine in Houston, Texas. These are studies of the genetic basis and evolution of the shape of the head in primate evolution. This is part of the NSF Hominid, or Human Origins, program  and has NIH funding as well. Our project involves gene mapping in baboons and mice and various aspects of informatics and experimental mouse genetics, based on morphometric findings on CT-scanned individuals from a cross between large and small mice and about 900 baboons from a large known research pedigree at the Texas Biomedical Research Institute, in San Antonio, Texas. In addition, the project is using the human, chimpanzee, macaque, and mouse (and other) whole-genome sequences, and bioinformatic (comparative DNA sequence analysis) approaches, to identify genes or regulatory regulatory sequence elements that might have undergone natural selection or similar evolutionary processes during the evolution of the uniquely shaped human head and face. We have already identified some interesting candidate regions in the baboon data.

In this project we are doing mapping analysis on roughly 1200 skulls from a 34th generation intercross between Lg and Sm inbred mice, who differ in body size. This is a resource developed by Jim Cheverud in St Louis, and the mapping results are being analyzed at the present time. Candidate regions will be followed up in various ways, including gene expression and other experimental studies of the role of candidate genes in the craniofacial development in mouse embryos. For more information about this project and its many facets, go here: www.hominid.psu.edu.  For fine-mapping, we are using whole-genome sequence data from a Lg and a Sm mouse.  We are using ForSim computer simulation to examine the nature of variation that we can document with the existing resources and to estimate how much new variation may have arisen during the breeding process.

Human Variation and Disease

I'm interested in the molecular genetic investigation of the amount of human variation, its geographic distribution, and the relationship of that variation to risk of common, complex, conditions like cardiovascular disease (CVD) and diabetes. The first main aim was to elucidate the full nature of standing variation as it occurs within and among human populations, both to relate observed variation to the effects of demographic factors and, where applicable the effects of natural selection, to define more clearly the ‘normal’ variation whose perturbations may be associated with disease. Currently I am not directly working with persons affected by disease. Rather, I'm interested in the evolutionary and population processes that generate the variation in our species that includes disease-associated variation. Our work on this at present is limited to simulation (ForSim, see below), and comparing simulated data to empirical data from genomic studies of complex traits.

ForSim: A Forward Evolutionary Computer Simulation

Population genetics theory provides vital tools to understand many aspects of genetic change over the long and the short term.  There are many excellent backward (coalescent) simulation programs that take a set of sequences sampled today and work backwards in time to reconstruct their common ancestral sequence.  Some of these programs can handle recombination, and natural selection in rather restricted ways. But life is really lived forward, and to be able to understand many aspects of evolution we need to be able to simulate the actual processes of genetic change as they happen forward in time.  Forward evolutionary simulations work the way nature does, screening on phenotypes and only indirectly on genotypes. Forward simulation is brute-force simulation, rather than resting on elegant theory, but for the same reason it is much more flexible.

With Brian Lambert, I have developed a forward evolutionary simulation program called ForSim, that is phenogenetic rather than genetic in nature, an attempt at full-fledged evolutionary simulation. A phenogenetic simulation generates genetic variation, but then rather than having that evolve directly, translates that variation (as real organisms do) into phenotypes, and it is those that are subject to various modes of natural selection, migration, and mate choice, in finite populations of various types, sizes, structures, and dynamics.  Populations grow, divide, die out, send migrants to each other, and so on. Modeling such phenomena is important for understanding the genetic architecture that results from the evolution of real biological traits.  For example, migration can be based on phenotypes, such as those more suitable to a new environment, which carries relevant genotypes along with it. The figure present results that suggest just a taste of what the program package can do. ForSim is exceedingly flexible, and allows users to specify many different aspects of a simulation, but because of that, the output may require analytic scripting for analysis. At the end, and/or at user-specified points during the simulation, the entire data at that time (or user-specified subsets) can be saved for post-run analysis. At the end, the population, case-control samples for a specified phenotype, and user-specified numbers of multi-generation pedigrees are saved, along with many figures displaying conditions during the run (e.g., population size, heritability, losses due to selection, etc.).

The  figure shows the haplotypes and linkage disequilibrium in two source populations (A and B) that evolved separately for 2,500 generations, after splitting from a single population that had evolved for 7,500 generations, and then formed an admixed population (C), that then evolved for 10 generations to the 'present' (this is like many admixed human populations in the US today). The figure shows the LD pattern differences between the populations as resolved by the Haploview program (on which HapMap was based, but applied to our simulated data).  Of the 10 simulated genes shown here, 5 affected a trait under weak selection, identified by arrows, while the other genes did not affect a trait and evolved neutrally. The admixture and selection affects are not great, as is usually the case, but can be seen: and these data are like real human data on most genome regions in admixed populations.

It is clear that human geneticists are having difficulty understanding the genetic architecture and specific contributing genes that underlie variation in important complex traits like those responsible for human diseases. This is a major impetus for the program, and ForSim is being written in collaboration with Joe Terwilliger and Joe Lee at Columbia who are interested in how study design and analytic strategy may be optimized in searches for disease genes, by mapping and other means.  The examples shown here reflect that biomedical interest.  At the end of a simulation pedigree data are saved, that can be tested in biostatistical inference packages. But the applications are not limited to humans, nor to very short-term population history, but can be applied to longer-term evolution as well.

ForSim accommodates multiple loci, chromosomes, and populations, and a range of mating, gene flow, phenogenetic, and selection models with parameters that can be changed at any point during a simulation, and copious output data during the run (as specified by the user) and at the end. Version 1.0 includes the ability to specify multiple genes and phenotypes, each gene able to affect different phenotypes, phenotypes determined by user-written functions of genes and environmental effects, that can vary over time or among populations, fitness functions of similar nature and complexity that can be based on multiple phenotypes, individual and family-specific environments, phenotype based assortative mating, multiple populations, gene x environment interaction. Users can address (and alter) many of the variables in ways, and at points during the run, by so specifying in the input specification file, in ForSim's own block-structured input scripting language. So there is extensive freedom for users to specify problems of their own interest, as free as possible from program-based restrictions. Most characteristics can be altered at specified points during a simulation. The program is fast given its flexibility; for example, ForSim can simulate a 100Mb chromosome with 10 genes of 40Kb each evolving for a population of 10,000 (roughly, the effective population size of the human species) for 10,000 generations (roughly the age of the human species), with mutation, recombination, the genes evolving neutrally in only a few minutes, on a desktop computer (runs on Linux and Mac, and probably Windows with CygWin). Zooming in on the first figure above, you can see the relative time demands of the various program functional branches (based on tracking C++ class calls during the run).

ForSim is not as fast as (more restrictive) coalescent simulations are, and is not designed to generate 100,000 replications of a situation to generate detailed probability distributions (unless you have access to a CPU farm). But it can generate a reasonable number of reps, sufficient for understanding basic genetic architecture and its heterogeneity for interestingly complex situations, and by using multiple unlinked genes and traits, and multiple populations, it can generate repetitions of multiple events during the same stochastic evolutionary history. Even for realistically complex scenarios, as CPU speed and memory increase exponentially, and with increasingly available and affordable high-speed CPU farms, large numbers of replications will increasingly become possible. Long-term evolutionary events can be understood, even to the point of generating 'ancient DNA', that is, directly saving the entire genetic population data at points during a run.

ForSim is intended for simulation of the generation of genetic variation within a user-specified genetic etiology (number of genes and their general mode of action and interaction, etc.), on a scale of up to millions of years (technically, open-ended), so that we can better understand the amount and nature of causal variation for complex traits today. There is extensive output and some knowledge of programming (e.g, scripting by Perl, Ruby, or Python) is important in parsing the data for specific uses, because so much data are made available. Many population geneticists and genetic epidemiologists have written ad hoc forward simulations of one kind or another. We think that ForSim is more flexible and general than they are, generally, more natural in terms of modeling evolution by phenotype, and less dependent on theoretical assumptions. However, we did not develop ForSim to be in a contest with other programs, and each has its own uses and unique value.

A brief introduction to ForSim is published (Lambert et al., Bioinformatics, 24: 1821-1822, 2008). The ForSim user Manual and the program version (C++ source code and other wrapper scripts and resources) are available on emailed request from us at no cost (conditional on agreeing to an open-source non-commercialization license; contact me at kenweiss@psu.edu).   The Manual describes the program and how to set up its input file that specifies the many different user-variable run conditions, along with samples of output text and graphics. The features, output data, and so on are continually being updated and improved. We will want users to register with us so we can notify of bugs and changes, and agree how to credit the program and describe any changes you make to the code in any resulting publications, watermarking the modified source code appropriately. The posted Manual may not be the most current; we are always updating it to make explanations clearer (we hope!) and as we modify the program or find errors.

New features have regularly been added, and that may continue.  That's because collaboration on this program is active. While Brian Lambert has moved on to his next job, Joe Terwilliger and a colleague, Tero Hiekkalinna. in Helsinki, Finland, will be maintaining it.  If you have a serious interest in using ForSim, contact me (kmw4@psu.edu) for the latest versions of the program itself and the user's Manual. NOTE: there have been some recent changes in the input file syntax and descriptions/documentation for the program, so if you're interested, get and use the latest version!

The program generates files that can be used in common genetic epidemiological software packages, such as Plink, LINKAGE, and others.

Bioethics

With the completion of the Human Genome Project (HGP) and many other genomic resources rapidly expanding, leading to accelerating use of genetic information in diagnosis and risk assessment, it is important to take stock of the larger social and ethical implications of population genetics research. Several relevant areas are currently of interest to me. I explore the empirical and epistemological limits of genetic information, both to highlight the current tendency toward what I think is excessive genetic determinism that goes beyond what we can actually say with data (or, often, directly against what we already know). It is important that people have a fuller understanding of the genetic and environmental contributions to complex traits and how difficult such traits, and their evolution, are to understand. Another major research interest is on the implications of population-specific (or as some would insist on calling it, 'race' specific) analysis, attempting to balance the likely benefits of increased knowledge regarding disease etiology and risk, against potential harms of stigmatization, discrimination, and categorical treatment of quantitative variation. I am also interested in the ways in which vested interests, ranging from private biotechnology firms to representatives of communities being investigated, interact to determine research priorities and hence, scientific outcomes and even our views of the science itself.

These subjects are connected through the over-arching theory of evolution, and genes as the underlying 'atoms' of this worldview. The powerful and perceptive ideas of genes and natural selection and historical evolution are extended, sometimes uncritically but with important implications for society and human well-being, to economics, politics, behavior, social relations and many other areas. I'm interested in how this happens and what it means--especially because even biologists seem sometimes unaware of its limitations within biology itself.

Bioethics Links

Rock Ethics Institute at Penn State:
http://rockethics.psu.edu/

Science, Medicine, and Technology in Culture program at Penn State:
http://rockethics.psu.edu/smtc/

ELSI Research Program of the National Human Genome Research Institute:
http://www.genome.gov/10001618

Bioethics Resources (from the NIH):
http://www.nih.gov/sigs/bioethics/index.html

Bioethics.net:
http://www.ajobonline.com/

New Mouse Under Development

"The Kawasaki"

"Easy to handle, but extremely difficult to breed" says Chief of Mouse Development Kazu Kawasaki of his latest mouse.


Mouse

Consisting of only a head and tail, the Kawasaki Mouse, eliminates the fuss and muss of internal organs* for researchers interested in tooth, brain, or tail development. An ancillary advantage is the ability to study flagellar locomotion in mammals. That work is in progress.

Look for this handy research tool to be available soon!