Using R’s model formula notation

Various commands in R accept a notation called model formula, or simply formula. The simplest form of the formula is,

y ~ x

where x and y are two variables. You can read this as ‘y is explained by x’. The dependent or response variable goes to the left of the tilde ‘~’ and the explanatory or independent variables goes to the right. This formula roughly corresponds to the linear equation,

y = ax + b

The interpretation is slightly different if the variables are categorical. Note that the intercept, b, is implicit in the model formula. If you like, you can be explicit by using the notation + 1. Or if you want to exclude it, e.g., force a regression line passing through origin, you can exclude it by notation – 1. In case you have multiple explanatory variables, it is easy to include them using the same notation. For example if you had two explanatory variables x1 and x2, you can specify it like this:

y ~ x1 + x2

The linear equation that correspond to this notation would be y = a1x1 + a2x2 + b.

As you may have figured out already, the arithmetic operators such as + and – have different meanings in a formula. So, if the variable you are interested is a combination of R variables, then you need a special notation. For example, you might be interested in fitting a linear model where y is explained by the sum of x1 and x2. That is, the equation you want to describe is y = a × (x1 + x2) +b. In such cases you need to use a special function, I(), to protect the arithmetic operation from being interpreted as part of the formula. In the case of our example, the correct formula notation is

y ~ I(x1 + x2)

If your explanatory variables are categorical, as in ANOVA, you may fit a model where interaction of the variables is important. Interaction of variables in a formula is expressed with a term where variable names are concatenated with column(s) between the variables. For example, the formula

y ~ x1 + x2 + x1:x2

expresses a model where interaction of x1 and x2 are also included in the model fitting. For two variables, we have only one possible interaction. If you have many variables, and want to include all interaction terms, it may be a hassle to type all the interaction terms separately. For example, all interactions of three variables x1, x2 and x3 consist of the two-way interactions x1:x2, x1:x3,x2:x3 and the three way interaction x1:x2:x3. To include all interactions, you can use ‘*’ instead of ‘+’. For example, to include three variables and all interactions in a model formula, we simply type y ~ x1 * x2 * x3.

The usual mathematical operators do not do what you may think. Here are a few different possibilities that will suffice for these notes.
Suppose the variables are generically named Y, X1, X2

formula meaning
Y ~ X1 Y is modeled by X1
Y ~ X1 + X2 Y is modeled by X1 and X2 as in multiple regression
Y ~ X1 * X2 Y is modeled by X1, X2 and X1*X2
Y ~ (X1 + X2)^2
Two-way interactions. Note usual powers
Y ~ X1+ I((X2^2)
Y is modeled by X1 and X22
Y ~ X1 | X2
Y is modeled by X1 conditioned on X2

Q:
How should I define a model formula in “R”, when one (or more) exact linear restrictions binding the coefficients is available.

Equation: y = b1*x1 + b2*x1

where y = b1*x1 for t < t1 and y = b2*x1 for t > t1

A:
Just create two new vectors:

x2 <- ifelse(t<t1, x1, 0)
x3 <- ifelse(t<t1, 0, x1)

Now you can simply fit y ~ x2 + x3 –1

.

Oglasi
Ovaj unos je objavljen u Nekategorizirano. Bookmarkirajte stalnu vezu.

Komentiraj

Popunite niže tražene podatke ili kliknite na neku od ikona za prijavu:

WordPress.com Logo

Ovaj komentar pišete koristeći vaš WordPress.com račun. Odjava / Izmijeni )

Twitter picture

Ovaj komentar pišete koristeći vaš Twitter račun. Odjava / Izmijeni )

Facebook slika

Ovaj komentar pišete koristeći vaš Facebook račun. Odjava / Izmijeni )

Google+ photo

Ovaj komentar pišete koristeći vaš Google+ račun. Odjava / Izmijeni )

Spajanje na %s