Main Site | Join Robin Hood Coop | Projects | Events | Blog | Media | Forums | Mailing List | Twitter | Facebook

Using integer ALU for data compression and storage

Previously I proposed using floating point numbers as data storage. But any super- accurate number system will do, if it can be represented in binary form. Also no floating point unit is needed, if information is processed at integer ALU. Different number systems are mentioned at book “Dynamics of number systems computattion with arbitarary precisison” Petr Kurka 2016, and texts by Elmery, Jansen, Katajainen: magical skew number system and canonical skew. There is hirarchical residue number system (Tomczak), Index-calculus number system, and “Numerical representations as pure data structures " Ivanovic 2002 has 4git + invariant system. Most important is Paul Tarau and his texts of giant numbers, hereditary number systems, and tree-based representation. There is “multiple-base composite integer” at MROB.com netpage s list of alternative number systems, and Munafo PT number system also. Zero Displacement Ternary (ZDTNS) claims to be the best for integers. Other ways to represent numbers are “Quote notation”, “Metallic number systems” at neuraloutlet.com netpage, different complex base number systems (for example researchgate.net/post/alternative_number_system_bases post by Rob Graigen (21.12 2012, 2+i base instead of 1+i), and “Bounded Integer Sequence Encoding” (BISE), Finite State Entropy (FSE) and its asymmetrical number system, and logarithmic number systems. Others: “ZOT-binary”, “A new number system for faster multiplication” (Hashemian 1996, bit signed integer, canonical sign digit), “Adapted Modular Number System” (AMNS), “Modular serial number system”, “Partition number system”, “Pree X code”, “Pisot number system”, “Pade approximation”, “Unary representation of natural numbers”, “Peano numbers”, “Binary number multiplication in 2 st complement representation”, “Hyperreal numbers” (Isabelle framework), “The shifted number system for fast linear algebra on integer matrices”, “Weighted bit-set encodings for reduntant digit sets: a theory” 2001, “New forms for computing real numbers” Hermigo 2015, “Fast modular exponentation of large integers”, “Dynamical directions in numeration” Barat, “Numbers as streams of digits” Froygny 2012, “Optimal left-to-right binary signed digit recoding” Joye 2009, “Encoding permutations as integers via the Lehmer code (Java script)”, stackowerflof.com netpage “Algorithms based on number systems base (closed)” 18. 3. 2011, including " Meta-base N-tuple enumeration framework”. These high accurate numbers systems, when used in binary form, if they have much higher accuracy than usual binary numbers (80 bits in binary is 80 bits accuracy and 80 bits range) can be used as “numbers piled on top of each other” principle. If 56 bits in binary is used to represent some super accurate number system, and that system has for example 100 orders of magnitude higher accuracy in for example 56bits, now accuracy is about 5000 bits using only 56 binary bits. Now that accuracy can be used as “piling numbers on top of each other” , first 560 bits of accuracy is used to representing 10 X 56 bit other similar super accurate numbers, these have inside 10 X 56 bits other numbers etc, and so this layer upon layer is “mathematical perpetum mobile” that ends when accuracy potential has worn off. All those numbers are individual ,different and don t interfere with each other. If solution of number compactification is solved, making “true” compact numbers, number storing capasity expands even more. Ordinary integer ALU has 56 bits bitwidth. but floating point units are larger, for example SPARC had earlier "double - double " 128 bit FPU, and Intel processors have nowadays 512 bit (actually just 8 X 64 bit, not true 512 bit) FPU. So it is possible to make 112 bit “double” integer ALU or 8 X 56 = 448 bit “octuple” ALU. These super accurate number systems seems to be more accurate the more bits are used with them, that accuracy expands exponentially as bit width increases, at least in ZDTNS, and integer values can be chained to 56 bit + 56 bit + 56 bit etc. chains and make super large numbers that are calculated at 56 bit integer ALU. So making 63 X 56 bit = 3528 bit super- wide super accurate number is possible, and that number can use standard 56 bit integer ALU. If accuracy expands exponentially the more bits are in use, that 3528 bit super accurate number should have enormous accuracy potential, and also enormous potential for layer upon layer expansion. Also using Trachtenberg speed system- style algorithms that make computations using simple numbers shifting tricks ( but Trachtenberg system is only for decimal numbers, so binary coded decimal or others must be used with this counting system, like quadiblock.com netpage “The proposed decimal floating point standard”) or James H. Wilkinson s “floating vectors” integer computations that were used in old tube valve computers when floating point computation was unavailable. Wilkinson invented “iterative refinement” but are those two things the same I don t know. If these algorithms are packed into the numbers itself, like the symbols that APL programming language uses, one symbol in APL is a calculus or equation or index or other mathematical entity, not just number or letter, now “index- calculus” number system can be used. Main point is that data can be packed “layer upon layer” when there is more accuracy in use in binary system than usual one bit is one bit accuracy and one bit range in 1 bit and in 1000 bits, 1 bit is 1 bit accuracy and 1000 bits is 1000 bits accuracy. But super accurate numbers systems have in 1000 bits (or just in 56 bits) much higher accuracy than just 1000 bits (or 56 bits) in binary form, altough they use binary form too to represent themselves. No data compression is used, because once compressed data can t be compressed second time, this is just binary representation of some accurate number system, and data packing is lossless, can be expanded using “layer upon layer” principle and perhaps using “true” compact numbers also. If 1000 bits can have 100 orders of magnitude more accuracy using efficient number system, mow 1000 bits have 100 000 bit accuracy. Accuracy is now used to pack inside one 100 000 bit accuracy number other accurate 1000 bit numbers (that themselves have 100 000bit accuracy), each number consumes 1000 bits of accuracy of the base number, but those new numbers now have 99 000bit accuracy (first one), 98 000 bit accuracy (second one) etc, creating exponential increase in data capacity when those numbers are used in themselves store other numbers using layer upon layer principle. inside these about 100 000 bit accuracy numbers data can be picture, video, text or audio, composed of long line of integers chained to one about 100 000 bit length chain. If information is is in 16 bit sections, and accuracy is in binary form (like all computer arithmetic), first 16 bits occupies first 16 bits of accuracy (216, two s complement with 16 exponents), second goes to 217 - 232 etc. Numbers are individual and different and do not interfere with each other. These are called “compact numbers”, several numbers chained as one, like computing 5 + 4 + 9 +5 =24, but compact number version of it is 5495, not 24, because numbers 5,4,9 and 5 can be now easily extracted from “compact number”. But “true” compact number should be 24. Extracting real principal components out of 24 is a problem, but if some arithmetic code (like error correction codes or blockchain of cryptocurrencies) is used and additional information encoded in the number chain or additional header, perhaps at some accuracy also “true compact numbers” could be made and put into long integer chain, the long integer chain into super accurate number system, and that super accurate number system is represented as binary number of 1000 bits (in my example), and finally that 1000 bits in 56 bit sections so that integer ALU with 56 bits can process it in 56bit + 56 bit sections, this is 1000 bit long binary integer (that has accuracy of 100 000 bits). Simpler solution is to use only one 56 bit integer number as base for super accurate number system, so 56 bits sis maximum length of that super accurate number. My choice of 63 X 56 bits = 3528 bits is because it is divided by 8, and near enough number 64, another number that is divided by 8, and 8 is is number that computer arithmetic uses. Making “true compact number” out of 232 would include 64 000 different 16 bit numbers, not just two in 232, because 232 is 4 billion values and 216 is 65 000 values, so 65 000 X 65 000 about 4 billion values = 232. But “ordinary” compact numbers (where 232 is just two 16 bit numbers in 216 + 216) will do if no method for “true” compact numbers is to be found. Gal s accuracy tables and their french “Gal s accuracy tablers method revisted” work in logarithmic scale also (in lxnsresearch.com is logarithmic number systems research listed), not just floating point, and perhaps integer counting algorithms like Trachtenberg speed system simplify computations (but Trachtenberg system works only using decimal numbers, at speleotrove.com is decimal to binary conversion systems listed), and large library can be created for integer processor that helps to count for example super large 3528 bit integer, and that integer library has for example terabyte or gigabytes memory requirement. Extreme case of accuracy is the proposed “infinity computer” concept, where numerical accuracy is boosted towards infinity, (almost) infinite accuracy is in use in numerical representation, so this layer upon layer technique can also be applied almost to infinity, making almost endless data storage but using only very few bits. Infinite computer concepts like Wolfgagang Matthes REAL computer architechture, J. A. W. Anderson “Perspex machine " and Ya. D.Sergeyev “Infinity computer”, Oswaldo Cadenas etc. Information that is encoded in those super accurate numbers can be any text, number, image pixel or herz in sound, using 8, 16 - 80 bits in integer. These samaller values are then chained to some larger value as compact mumber, either “ordinary” or “true” compact number. When layer upon layer principle is used actual information is in the last layer, other layers are used for information packing. If number that has used as information packing has aditional accuracy left that accuracy can be used as additional information store for similar information that is in the last layer. Texts: “Tournament coding of integer sequences” Teuhola 2009, “Greedy and lazy representations of numbers in the negative base system” Tom Hejda, there are other number systems, “lazy number systems”, “double-base number systems”, “index- calculus number systems”, “polynomial iterative system”, “minimal polynomial number system”, etc. “Representing numbers using Fibonacci variants” Lucas 2014, “The Nashville numbering system” (last one is music notation, not number system, but anyway), “The shifted number system for faster linear algebra on integer matrices”, “A field theory motivated approach to symbolic algebra” Peeters, neuraloutlet.com netpage “metallic number systems”, “Canonical Booth encoding”, Tournament coding of integers ( Teuhola 2007), “Performing arithmetic operations on round- to -nearest representations” Kornerup 2009, “Segmented number systems” , “Combinatorial number systems”, “Mixed-radix number systems”, Hyperreal / surreal number systems”. Most logical way is to use “bijective number system” like “Gödel numbers” using “Gödel numbering for sequences” (Wiki). Wikipedia is full of articles about “Gödel numbers”, “Computable function”, “Effective results in number theory”, “Effective method”, “Chinese remainder theorem” , “Recursive function theory”, “Recursive set” “Total recursive function”, “u-recursive function”, “Lambda calculus”, “Church-Turing thesis” “square-free integer”, “Helly family” etc. Best would be number system that is at the same time easily partitioned and super- accurate. But super- accurate numbers system is not even needed if Godel numbering os secuences or other ways to make “true” compact numbers is used, 3528 bit integer means 23528 in two s complement, and if there is a way to make “true” compact numbers 23528 is so large number space, if now 232 is not two 16 bit numbers but 65 000, because 16 bit is 65 000 values and 65 000 X 65 000 = 4 billion values using “true” comapact numbers, with Gödel numbering or other method that makes partitioning or dividing one very large number to its principal components possible, now 3528 bit integer number, divided to 56 X 63 sections so that ALU can handle it, and those 56 bit sections are composed of billions of 16 bit values (or 65 000 in decimal system) , billionns of them in one 56 bit section, and that section is just part of 63 X 56 bit super large integer. If 23528 is available number space, its storing capacity of “true” compact numbers is astronomical, so no super accurate number system is needed. But if true compact numbers are not used, then can be used some super- accurate number system, Gödel numbers or other similar methods are not needed now, but super- accurate numbers is used instead to increase storing capacity (compared to normal 3528 bit integer that is divided to normal ordinary 16 bit sections for compact numbers, only about 200 of them would fit in ordinary way in 3528 bits). Either way, information storing capacity of 23528, or 3528 bit integer is huge. If those two methods are combined storing capacity is even larger. Perhaps “Gödel numbering library” that is like floating point library is needed at the PC, perhaps using terabyte or gigabytes of memory, or super accurate number system library, those work like floating point library but use those number systems and help CPU (or CPU & GPU combination) make calcutations faster, because computing 3528 bit integer is difficult if 56 bits is maximum ALU bitwidth. Sections that are coded using either super accurate number system or “true” compact numbers can be 8, 16, 24, 32, 64, 80 or any other bit width. And if accuracy of super accurate numbers is much higher than usual 1 bit is 1 bit accuracy even in 3528 bits level, those numbers can be layered using layer upon layer principle and make information store even larger. But can any numeration, Gödel system or other, make zillions of smaller values coded into one very large number, and that one large number can be divided down to its principal components (zillions of them, or zillions X zillions etc.) using header, blockchain coding, error correction code, reversible computing, Gödel numeration etc. at least in small accuracy? Accuracy is not so important, even erratic method will do, more important is capacity to encode zillions of values into one “true” compact number. Other texts: “Scheduled relaxation Jacobi (SRJ) method”, “A binary access control using prime factorization” Chang 1997. Also Wikipedia article: “Constant-recursive sequence”, that has Lucas sequences etc. There is “Furer s theorem” and its improvemet for integers, and “partition number system”, “marginal matrices/ vectors”, “Fourier analysis on number fields” 2013, “On cyclic numbers of one-dimensional compact sets” Wilson 1932. In netpage iteror.org/big/Retro/math/number_horizon is “Novaloka maths on the number horizon - beyond the abacus”, interesting article about “BigO”-, factoradic-, binary double edged- / ternary triple edged-, “binary system B”-, “Nested binary biological number system B notation”-, “Alpha Mann numbers”. There is a odd book “Encounters with infinity: a meta-mathematical dissertation” by Michael van Laanen 1994/2002 but I don t know if it is nonsense or not. In askmathematician.com netpage “Is there a number set that is above complex numbers?” is answer. Most interesting webpages for numbers systems are MROB.com (bot alternative bases and floating point), neuraloutlet.com (metallic number systems), that “On the number horizon” above, and many different writings of Paul Tarau about number systems (at Paul Tarau s home page). There are however lot more numbers systems (Ackerman-, Peano-, Pisot-, Gödel-, Lucas numbers, Pade approximation, skew-, modular serial-, minimal polynomial-, polynomial iterative-, partition number system, “Finite state transducer”, “Asymmetric high radix signet-digit number systems for carry free addition” 2003, “Radix-2r arithmetic for multiplying by constant”, “Fractions in the canonical signed digit number”, “Fast modular exponentation of large numbers”, “Canonical Booth encoding”. And Wikipedia has “Residual sum of squares” article, which leads to stats.stackexhange.com/questions “Beta regression accounting for residual spatial auto-correction in R” and stackoverflow.com/questions “Calculations of residuals for extreme value distribution in R”, in .uk.warwick.ac netpage is “The simple line R regressive model”, “Using extreme value theory and copulal to evaluate market risk”, the latter has useful math altough is about stock exchange. There is a new data quantization method Additive Quantization (“Revisiting additive quantization” Martinez, “Solving multi-codebook in quantization in the GPU”, “Additive quantization for extreme vector compression”). If those methods are based on recursively indexed quantizer like earlier “stacked quantizer” (Martinez) perhaps recursive number systems or index-calculus number systems can be used with additive quantization. There is an article “A new approach to the classification of positional number systems” (Borisenko, Kalashnikov 2014).