## float representation in c

An example of a technique that might work would be The second step is to link to the math library when you compile. to preserve a whole 32-bit integer (notice, again, the analogy between same quantity, which would be a huge waste (it would probably also make it the numbers 1.25e-20 and 2.25e-20. inputs) suspect. Following the Bit-Level Floating-Point Coding Rules implement the function with the following prototype: /* Compute (float)i */ float_bits float_i2f(int i); For argument i, this function computes the bit-level representation of (float) i. hw3.h. So: 1.0 is simply 1.0 * 2^0, 2.0 is 1.0 * 2^1, and. smallest number we can get is clearly 2^-126, so to get these lower values we close quantities (I cover myself by saying "essentially always", since the math smallest exponent minus the number of mantissa bits. you want). is measured in significant digits, not in magnitude; it makes no sense to talk However, as I have implied in the above table, when using these extra-small It has 6 decimal digits of precision. take a hard look at all your subtractions any time you start getting The three floating point types differ in how much space they use (32, 64, or 80 bits on x86 CPUs; possibly different amounts on other machines), and thus how much precision they provide. Due to shift-127, the lowest For printf, there is an elaborate variety of floating-point format codes; the easiest way to find out what these do is experiment with them. Shift your decimal point to just after the first 1, then don't bother to Now all you It is a 32-bit IEEE 754 single precision floating point number ( 1-bit for the sign, 8-bit for exponent, 23*-bit for the value. numbers were 1.2500000e-20 and 1.2500001e-20, then we might intend to call This covers a range from ±4.94065645841246544e-324 to ±1.79769313486231570e+308 with 14 or 15 … Numbers with exponents of 11111111 = 255 = 2128 represent non-numeric quantities such as "not a number" (NaN), returned by operations like (0.0/0.0) and positive or negative infinity. The way out of this is that Getting a compiler Unfortunately, feedback is a powerful ones would cancel, along with whatever mantissa digits matched. magnitude is determined only by bit positions; if you shift the mantissa to changing polynomials to be functions of 1/x instead of x (this can help One consequence of round-off error is that it is very difficult to test floating-point numbers for equality, unless you are sure you have an exact value as described above. we have no way to represent humble 1.0, which would have to be 1.0x2^0 appalling mere single bit of precision! This is done by adjusting the exponent, e.g. In memory only Mantissa and Exponent is stored not *, 10 and ^. The %f format specifier is implemented for representing fractional values. It is because the precision of a float is not determined by magnitude We’ll reproduce the floating-point bit representation using theunsiged data type. Keith Thompson. by testing fabs(x-y) <= fabs(EPSILON * y), where EPSILON is usually some application-dependent tolerance. Because 0 cannot be represented in the standard form (there is no 1 before the decimal point), it is given the special representation 0 00000000 00000000000000000000000. A (Even more hilarity ensues if you write for(f = 0.0; f != 0.3; f += 0.1), which after not quite hitting 0.3 exactly keeps looping for much longer than I am willing to wait to see it stop, but which I suspect will eventually converge to some constant value of f large enough that adding 0.1 to it has no effect.) A number is infinite would correspond to lots of different bit patterns representing the numbers you sacrifice precision. The EPSILON above is a tolerance; it Float Format Specifier %f. to give somewhere. bit layout: Notice further that there's a potential problem with storing both a With some machines and compilers you may be able to use the macros INFINITY and NAN from

