For information on using this system, please visit this page.

Bug 17616 - Anomaly with contrast functions
Summary: Anomaly with contrast functions
Status: CLOSED FIXED
Alias: None
Product: R
Classification: Unclassified
Component: Models (show other bugs)
Version: R 3.6.xx
Hardware: All All
: P5 normal
Assignee: R-core
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-09-27 09:06 UTC by Peter Dalgaard
Modified: 2022-06-27 08:31 UTC (History)
3 users (show)

See Also:


Attachments
test case (389 bytes, text/plain)
2022-05-28 18:37 UTC, Long Qu
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Dalgaard 2019-09-27 09:06:17 UTC
If you supply a contrast function to a factor, results depend on whether you pass the name or the actual function. This applies to C(), contrasts()<-, as well as lm(...., contrasts=list()). 

Example:

> lm(angle~C(recipe, "contr.treatment"), cake)

Call:
lm(formula = angle ~ C(recipe, "contr.treatment"), data = cake)

Coefficients:
                  (Intercept)  C(recipe, "contr.treatment")B  
                       33.122                         -1.478  
C(recipe, "contr.treatment")C  
                       -1.522  

> lm(angle~C(recipe, contr.treatment), cake)

Call:
lm(formula = angle ~ C(recipe, contr.treatment), data = cake)

Coefficients:
                (Intercept)  C(recipe, contr.treatment)2  
                     33.122                       -1.478  
C(recipe, contr.treatment)3  
                     -1.522  


Notice that the former uses level names and the latter level codes. Presumably, that is a bug, I see no rationale for it.

It is not immediately clear to me what is happening from the code around src/main/models.c:1710ff, but the effect is like if you call contr.treatment(levels(recipe)) in one case and contr.treatment(nlevels(recipe)) in the other.
Comment 1 Benjamin Tyner 2019-09-28 21:14:17 UTC
Note: cake is from lme4
Comment 2 Peter Dalgaard 2019-09-28 21:41:29 UTC
Aw, drats... I was just looking for a standard data set with a factor and a continuous variable in it, and I apparently had lme4 loaded. The data set is immaterial; this'll do just as well

lm(uptake~C(Treatment, "contr.treatment"), CO2)
lm(uptake~C(Treatment, contr.treatment), CO2)
Comment 3 Long Qu 2022-05-28 18:37:29 UTC

Created attachment 3029 [details]
test case

Comment 4 Long Qu 2022-05-28 18:40:40 UTC
Comment on attachment 3029 [details]
test case

I can confirm this problem. I believe the problem is in `model.matrix`: 

This is my custom contrast function: 
```
> my.contrast=function(n, contrasts=TRUE)
{
  a=diag(1,n,n)
  if(!contrasts) return(a)
  a[lower.tri(a)]=1
  a[,-1,drop=FALSE]
}
> my.contrast(4)
     [,1] [,2] [,3]
[1,]    0    0    0
[2,]    1    0    0
[3,]    1    1    0
[4,]    1    1    1
```

An example of (ordered) factor of my treatment levels: 
```
> (of=ordered(rep(2:4, 2:4)))
[1] 2 2 3 3 3 4 4 4 4
Levels: 2 < 3 < 4
```

When passing my contrast as a matrix, a function or a **backquoted** string to `model.matrix`, I got correct results:
```
> model.matrix(~of, contrasts.arg=list(of=`my.contrast`))
  (Intercept) of1 of2
1           1   0   0
2           1   0   0
3           1   1   0
4           1   1   0
5           1   1   0
6           1   1   1
7           1   1   1
8           1   1   1
9           1   1   1
attr(,"assign")
[1] 0 1 1
attr(,"contrasts")
attr(,"contrasts")$of
  [,1] [,2]
2    0    0
3    1    0
4    1    1
```

But when I passed my contrast as a **usual character string**, the results were incorrect: 
```
> model.matrix(~of, contrasts.arg=list(of="my.contrast"))
  (Intercept)           of1
1           1  0.000000e+00
2           1  0.000000e+00
3           1  1.000000e+00
4           1  1.000000e+00
5           1  1.000000e+00
6           1 2.793409e-314
7           1 2.793409e-314
8           1 2.793409e-314
9           1 2.793409e-314
attr(,"assign")
[1] 0 1
attr(,"contrasts")
attr(,"contrasts")$of
[1] "my.contrast"
```

However, the documentation of `model.matrix` did not say that the contrast function name should not be given as a usual character string, 
```
contrasts.arg: a list, whose entries are values (numeric matrices,
          ‘function’s or character strings naming functions) to be used
          as replacement values for the ‘contrasts’ replacement
          function and whose names are the names of columns of ‘data’
          containing ‘factor’s.
```
and it gave an example illustrating the use of a usual character string:
```
 model.matrix(~ a + b, dd, contrasts.arg = list(a = "contr.sum"))
```

The documentation of `contrasts(x, how.many = NULL) <- value` says
```
   value: either a numeric matrix (or a sparse or dense matrix of a
          class extending ‘dMatrix’ from package ‘Matrix’) whose
          columns give coefficients for contrasts in the levels of ‘x’,
          or the (quoted) name of a function which computes such
          matrices.
```
It didn't say how the name of the function should be quoted. However, the same problem can be reproduce when manually setting `contrasts` for my factor `of`:
```
> contrasts(of)="my.contrast"
> model.matrix(~of)
  (Intercept)          of1
1           1  0.00000e+00
2           1  0.00000e+00
3           1  1.00000e+00
4           1  1.00000e+00
5           1  1.00000e+00
6           1 5.30499e-315
7           1 5.30499e-315
8           1 5.30499e-315
9           1 5.30499e-315
attr(,"assign")
[1] 0 1
attr(,"contrasts")
attr(,"contrasts")$of
[1] "my.contrast"

> model.matrix(~C(of, "my.contrast"))
  (Intercept) C(of, "my.contrast")1
1           1           0.00000e+00
2           1           0.00000e+00
3           1           1.00000e+00
4           1           1.00000e+00
5           1           1.00000e+00
6           1          5.30499e-315
7           1          5.30499e-315
8           1          5.30499e-315
9           1          5.30499e-315
attr(,"assign")
[1] 0 1
attr(,"contrasts")
attr(,"contrasts")$`C(of, "my.contrast")`
[1] "my.contrast"
```
Comment 5 Long Qu 2022-05-28 19:01:23 UTC

or maybe the contrasts<- function can be updated by replacing the block

 else if (is.character(value)) 
        cm <- value

with

 else if (is.character(value)) 
        cm <- value(nlevels(x))

Then model.matrix probably won't need any change.

Comment 6 Martyn Plummer 2022-06-27 08:31:17 UTC
We have been discussing this in the bug BBQ satellite event at useR! 2022. See https://github.com/r-devel/bug-bbq/issues/1 where Heather Turner identifies the problem and fixes it.

To Long Qu. The problems you show with model.matrix are because your custom contrast function `my.contrast` does work correctly when given a vector of names for the factor levels. This requirement is documented in ?contrasts and for the standard contrast functions in ?contr.treatment. When this is fixed, the problems you observe with model.matrix go away.

The bug is in the replacement function `contrasts<-`. When given a function name it sets the "contrasts" attribute to the function name:

> contrasts(CO2$Treatment) <- "contr.treatment"
> attr(CO2$Treatment, "contrasts")
[1] "contr.treatment"

When given a function object, it sets the "contrasts" attribute to a contrast matrix:

> contrasts(CO2$Treatment) <- contr.treatment
> attr(CO2$Treatment, "contrasts")
           2
nonchilled 0
chilled    1

But when the contrast matrix is created, the level names are not used. The fix is straightforward: change nlevels(x) to levels(x). The same issue, with the same fix, appears in C().

Fixed in r82526.