Reference equality of Integers in Java

Recently, a colleague of mine showed me a piece of Java code which has - at first sight - a strange behaviour.


The following code...

Map<Integer, Object> identity = new IdentityHashMap<Integer, Object>(); 
String fu = new String("fu"); 
String bar = new String("bar"); 
identity.put(128, fu); 
identity.put(127, bar); 
System.out.println("127: " + identity.get(127)); 
System.out.println("128: " + identity.get(128));

 ...leads to the following output:

127: bar 
128: null

Let's first take a look at the datastructure used in this example, the IdentityHashMap

This class implements the Map interface with a hash table, using reference-equality in place of object-equality when comparing keys (and values). In other words, in an IdentityHashMap, two keys k1 and k2 are considered equal if and only if (k1==k2). (In normal Map implementations (like HashMap) two keys k1 and k2 are considered equal if and only if (k1==null ? k2==null : k1.equals(k2)).)

This class is not a general-purpose Map implementation! While this class implements the Map interface, it intentionally violates Map's general contract, which mandates the use of the equals method when comparing objects. This class is designed for use only in the rare cases wherein reference-equality semantics are required.

So, in contrast to other Map implementations, keys are only considered equal if they are reference-equal, thus, if the very same key object is used for putting and getting.

In the code example above, two Strings are put into the Map with Integer keys. The Integer objects for both put() and get() are automatically generated through autoboxing from ints. 

One could assume that when invoking get() on the Map, two new Integer objects are generated through autoboxing. These two objects should not be reference-equal to the two Integer objects (also generated through autoboxing) that are used as keys in the map. Therefore, the get() should return null. However, this is only true for get(128) - get(127) returns the String value that was added with key 127. Why is that the case?

To answer this question, we need to take a look at the chapter about Boxing Conversion in the Java Language Specification:

If the value p being boxed is truefalse, a byte, or a char in the range \u0000 to \u007f, or an int or short number between -128 and 127 (inclusive), then let r1and r2 be the results of any two boxing conversions of p. It is always the case that r1 == r2.

Ideally, boxing a given primitive value p, would always yield an identical reference. In practice, this may not be feasible using existing implementation techniques. The rules above are a pragmatic compromise. The final clause above requires that certain common values always be boxed into indistinguishable objects. The implementation may cache these, lazily or eagerly. For other values, this formulation disallows any assumptions about the identity of the boxed values on the programmer's part. This would allow (but not require) sharing of some or all of these references.

This ensures that in most common cases, the behavior will be the desired one, without imposing an undue performance penalty, especially on small devices. Less memory-limited implementations might, for example, cache all char and short values, as well as int and long values in the range of -32K to +32K.

So, this explains very well why the behaviour differs for 127 and 128!