Both must be powers of two, for optimized access. In some cases, this:
is fast, but in others
texel_offset = (texel_v << o) + texel_u
is faster. Depends on the driver, implementation, etc. Note that the above are optimizations of:
texel_offset = (texel_u << o) + texel_v
where o, is a power of 2, such that "2 << o == texture_width".
texel_offset = texel_v*texture_width + texel_u
To make these optimizations possible, some restrictions have to be placed.
But newer cards support rectangular textures which can be of any size. But they're considerably slower.