Loading README.md +42 −27 Original line number Original line Diff line number Diff line **zlib-deflate-nostdlib** provides a zlib decompressor (RFC 1950) and deflate **zlib-deflate-nostdlib** provides a zlib decompressor (RFC 1950) and deflate reader (RFC 1951) suitable for 8- and 16-bit microcontrollers. It works reader (RFC 1951) suitable for 8- and 16-bit microcontrollers. It works fine on fine on MCUs as small as ATMega328P (used, for example, in the Arduino Nano) MCUs as small as ATMega328P (used, for example, in the Arduino Nano) and and MSP430FR5994. It is compatible with both C (from c99 on) and C++. Apart MSP430FR5994. It is compatible with both C (from c99 on) and C++. Apart from from type definitions for (u)int8\_t, (u)int16\_t, and (u)int32\_t, which are type definitions for (u)int8\_t, (u)int16\_t, and (u)int32\_t, which you can typically provided by stdint.h, it has no external dependencies. provide yourself if stdint.h is not available, it has no external dependencies. zlib-deflate-nostdlib is focused on a low memory footprint. It is not optimized zlib-deflate-nostdlib is focused on a low memory footprint and not on speed. for speed and uses a pretty naive implementation right now. Depending on architecture and compilation settings, it requires **1.6 to 2.6 kB of ROM** and **0.5 to 1.2 kB of RAM**. Decompression speed ranges from **1 to 5 kB/s per MHz**. See below for details and tunables. Note: This library *inflates* (i.e., decompresses) data. The source files and Note: This library *inflates* (i.e., decompresses) data. The source files and API are named as such, as is the corresponding function in the original zlib API are named as such, as is the corresponding function in the original zlib Loading Loading @@ -105,42 +107,55 @@ is designed for. In that case, you are probably better off with ## Memory Requirements ## Memory Requirements Excluding the decompressed data buffer, zlib-deflate-nostdlib needs about Compilation with `-Os`. ROM/RAM values are rounded up to the next multiple of 2.5 kB of ROM and 500 Bytes of RAM. Actual values depend on the architecture, 16B and do not include the buffer for decompressede data. see the tables below. ROM/RAM values are rounded up to the next multiple of 16B. ### default (no checksum verification) ### baseline (no checksum verification) | Architecture | ROM | RAM | | Architecture | ROM | RAM | | :--- | ---: | ---: | | :--- | ---: | ---: | | 8-bit ATMega328P | 1824 B | 640 B | | 8-bit ATMega328P | 1808 B | 640 B | | 16-bit MSP430FR5994 | 2272 B | 448 B | | 16-bit MSP430FR5994 | 2256 B | 448 B | | 20-bit MSP430FR5994 | 2576 B | 464 B | | 20-bit MSP430FR5994 | 2560 B | 464 B | | 32-bit ESP8266 | 1888 B | 656 B | | 32-bit ESP8266 | 1888 B | 656 B | | 32-bit STM32F446RE (ARM Cortex M3) | 1600 B | 464 B | | 32-bit STM32F446RE (ARM Cortex M3) | 1616 B | 464 B | ### compliant mode (-DDEFLATE\_CHECKSUM) ### compliant mode (-DDEFLATE\_CHECKSUM) ROM = baseline + 150 to 300 B, RAM = baseline. ### faster mode (-DDEFLATE\_WITH\_LUT) | Architecture | ROM | RAM | | Architecture | ROM | RAM | | :--- | ---: | ---: | | :--- | ---: | ---: | | 8-bit ATMega328P | 2032 B | 640 B | | 8-bit ATMega328P | — | — | | 16-bit MSP430FR5994 | 2560 B | 448 B | | 16-bit MSP430FR5994 | 2896 B | 1088 B | | 20-bit MSP430FR5994 | 2896 B | 464 B | | 20-bit MSP430FR5994 | 3248 B | 1088 B | | 32-bit ESP8266 | 2048 B | 656 B | | 32-bit ESP8266 | 1856 B | 1296 B | | 32-bit STM32F446RE (ARM Cortex M3) | 1782 B | 464 B | | 32-bit STM32F446RE (ARM Cortex M3) | 1664 B | 1104 B | ## Performance ## Performance Due to its focus on low RAM usage, zlib-deflate-nostdlib is very slow. Expect Tested with text files of various sizes, minimum file size 500 bytes, maximum about 1kB/s per MHz on 16-bit and 2kB/s per MHz on 32-bit architectures. Tested file size determined by the amount of available RAM. with text files of various sizes, minimum file size 500 bytes, maximum file size determined by the amount of available RAM. ### baseline (no checksum verification) | Architecture | Speed @ 1 MHz | Speed | CPU Clock | | Architecture | Speed @ 1 MHz | Speed | CPU Clock | | :--- | ---: | ---: | ---: | | :--- | ---: | ---: | ---: | | 8-bit ATMega328P | 1 kB/s | 10 .. 22 kB/s | 16 MHz | | 8-bit ATMega328P | 1 kB/s | 10 .. 22 kB/s | 16 MHz | | 16-bit MSP430FR5994 | 1 kB/s | 8..15 kB/s | 16 MHz | | 16-bit MSP430FR5994 | 1 kB/s | 8..16 kB/s | 16 MHz | | 20-bit MSP430FR5994 | 1 kB/s | 8..17 kB/s | 16 MHz | | 20-bit MSP430FR5994 | 1 kB/s | 8..16 kB/s | 16 MHz | | 32-bit ESP8266 | 1 .. 3 kB/s | 79..246 kB/s | 80 MHz | | 32-bit ESP8266 | 1 .. 3 kB/s | 79..246 kB/s | 80 MHz | | 32-bit STM32F446RE (ARM Cortex M3) | 1 .. 5 kB/s | 282..875 kB/s | 168 MHz | | 32-bit STM32F446RE (ARM Cortex M3) | 1 .. 5 kB/s | 282..875 kB/s | 168 MHz | ### faster mode (-DDEFLATE\_WITH\_LUT) | Architecture | Speed @ 1 MHz | Speed | CPU Clock | | :--- | ---: | ---: | ---: | | 8-bit ATMega328P | — | — | 16 MHz | | 16-bit MSP430FR5994 | 2 kB/s | 22..37 kB/s | 16 MHz | | 20-bit MSP430FR5994 | 2 kB/s | 20..34 kB/s | 16 MHz | | 32-bit ESP8266 | 3 .. 8 kB/s | 234..671 kB/s | 80 MHz | | 32-bit STM32F446RE (ARM Cortex M3) | 6 .. 17 kB/s | 986..2815 kB/s | 168 MHz | src/inflate.c +89 −3 Original line number Original line Diff line number Diff line Loading @@ -92,6 +92,11 @@ uint8_t deflate_hc_lengths[19]; */ */ uint8_t deflate_lld_lengths[318]; uint8_t deflate_lld_lengths[318]; #ifdef DEFLATE_WITH_LUT uint16_t deflate_ll_codes[288]; uint16_t deflate_d_codes[30]; #endif /* /* * Bit length counts and next code entries for Literal/Length alphabet. * Bit length counts and next code entries for Literal/Length alphabet. * Combined with the code lengths in deflate_lld_lengths, these make up the * Combined with the code lengths in deflate_lld_lengths, these make up the Loading Loading @@ -159,8 +164,14 @@ static uint16_t deflate_get_bits(uint8_t num_bits) return ret & deflate_bitmask(num_bits); return ret & deflate_bitmask(num_bits); } } #ifdef DEFLATE_WITH_LUT static void deflate_build_alphabet(uint8_t * lengths, uint16_t size, uint8_t * bl_count, uint16_t * next_code, uint16_t * codes) #else static void deflate_build_alphabet(uint8_t * lengths, uint16_t size, static void deflate_build_alphabet(uint8_t * lengths, uint16_t size, uint8_t * bl_count, uint16_t * next_code) uint8_t * bl_count, uint16_t * next_code) #endif { { uint16_t i; uint16_t i; uint16_t code = 0; uint16_t code = 0; Loading @@ -178,12 +189,28 @@ static void deflate_build_alphabet(uint8_t * lengths, uint16_t size, } } } } for (i = 1; i < max_len + 1; i++) { for (i = 1; i <= max_len; i++) { code = (code + bl_count[i - 1]) << 1; code = (code + bl_count[i - 1]) << 1; next_code[i] = code; next_code[i] = code; } } #ifdef DEFLATE_WITH_LUT uint8_t j = 0; code = 0; for (j = 1; j <= max_len; j++) { for (i = 0; i < size; i++) { if (lengths[i] == j) { codes[code++] = i; } } } #endif } } #ifdef DEFLATE_WITH_LUT static uint16_t deflate_huff(uint16_t * codes, uint8_t * bl_count, uint16_t * next_code) #else /* /* * This function trades speed for low memory requirements. Instead of building * This function trades speed for low memory requirements. Instead of building * an actual huffman tree (at a cost of about 650 Bytes of RAM), we iterate * an actual huffman tree (at a cost of about 650 Bytes of RAM), we iterate Loading @@ -192,8 +219,12 @@ static void deflate_build_alphabet(uint8_t * lengths, uint16_t size, */ */ static uint16_t deflate_huff(uint8_t * lengths, uint16_t size, static uint16_t deflate_huff(uint8_t * lengths, uint16_t size, uint8_t * bl_count, uint16_t * next_code) uint8_t * bl_count, uint16_t * next_code) #endif { { uint16_t next_word = deflate_get_word(); uint16_t next_word = deflate_get_word(); #ifdef DEFLATE_WITH_LUT uint16_t code = 0; #endif for (uint8_t num_bits = 1; num_bits < 16; num_bits++) { for (uint8_t num_bits = 1; num_bits < 16; num_bits++) { uint16_t next_bits = deflate_rev_word(next_word, num_bits); uint16_t next_bits = deflate_rev_word(next_word, num_bits); if (bl_count[num_bits] && next_bits >= next_code[num_bits] if (bl_count[num_bits] && next_bits >= next_code[num_bits] Loading @@ -203,9 +234,11 @@ static uint16_t deflate_huff(uint8_t * lengths, uint16_t size, deflate_input_now++; deflate_input_now++; deflate_bit_offset -= 8; deflate_bit_offset -= 8; } } #ifdef DEFLATE_WITH_LUT return codes[code + (next_bits - next_code[num_bits])]; #else uint8_t len_pos = next_bits; uint8_t len_pos = next_bits; uint8_t cur_pos = next_code[num_bits]; uint8_t cur_pos = next_code[num_bits]; // This is slow, but memory-efficient for (uint16_t i = 0; i < size; i++) { for (uint16_t i = 0; i < size; i++) { if (lengths[i] == num_bits) { if (lengths[i] == num_bits) { if (cur_pos == len_pos) { if (cur_pos == len_pos) { Loading @@ -214,20 +247,35 @@ static uint16_t deflate_huff(uint8_t * lengths, uint16_t size, cur_pos++; cur_pos++; } } } } #endif } else { #ifdef DEFLATE_WITH_LUT code += bl_count[num_bits]; #endif } } } } return 65535; return 65535; } } #ifdef DEFLATE_WITH_LUT static int8_t deflate_huffman(uint16_t * ll_codes, uint16_t * d_codes) #else static int8_t deflate_huffman(uint8_t * ll_lengths, uint16_t ll_size, static int8_t deflate_huffman(uint8_t * ll_lengths, uint16_t ll_size, uint8_t * d_lengths, uint8_t d_size) uint8_t * d_lengths, uint8_t d_size) #endif { { uint16_t code; uint16_t code; uint16_t dcode; uint16_t dcode; while (1) { while (1) { #ifdef DEFLATE_WITH_LUT code = deflate_huff(ll_codes, deflate_bl_count_ll, deflate_next_code_ll); #else code = code = deflate_huff(ll_lengths, ll_size, deflate_bl_count_ll, deflate_huff(ll_lengths, ll_size, deflate_bl_count_ll, deflate_next_code_ll); deflate_next_code_ll); #endif if (code < 256) { if (code < 256) { if (deflate_output_now == deflate_output_end) { if (deflate_output_now == deflate_output_end) { return DEFLATE_ERR_OUTPUT_LENGTH; return DEFLATE_ERR_OUTPUT_LENGTH; Loading @@ -244,10 +292,17 @@ static int8_t deflate_huffman(uint8_t * ll_lengths, uint16_t ll_size, if (extra_bits) { if (extra_bits) { len_val += deflate_get_bits(extra_bits); len_val += deflate_get_bits(extra_bits); } } #ifdef DEFLATE_WITH_LUT dcode = deflate_huff(d_codes, deflate_bl_count_d, deflate_next_code_d); #else dcode = dcode = deflate_huff(d_lengths, d_size, deflate_huff(d_lengths, d_size, deflate_bl_count_d, deflate_bl_count_d, deflate_next_code_d); deflate_next_code_d); #endif uint16_t dist_val = deflate_distance_offsets[dcode]; uint16_t dist_val = deflate_distance_offsets[dcode]; extra_bits = deflate_distance_bits[dcode]; extra_bits = deflate_distance_bits[dcode]; if (extra_bits) { if (extra_bits) { Loading Loading @@ -313,12 +368,21 @@ static int8_t deflate_static_huffman() deflate_lld_lengths[i] = 5; deflate_lld_lengths[i] = 5; } } #ifdef DEFLATE_WITH_LUT deflate_build_alphabet(deflate_lld_lengths, 288, deflate_bl_count_ll, deflate_next_code_ll, deflate_ll_codes); deflate_build_alphabet(deflate_lld_lengths + 288, 29, deflate_bl_count_d, deflate_next_code_d, deflate_d_codes); return deflate_huffman(deflate_ll_codes, deflate_d_codes); #else deflate_build_alphabet(deflate_lld_lengths, 288, deflate_bl_count_ll, deflate_build_alphabet(deflate_lld_lengths, 288, deflate_bl_count_ll, deflate_next_code_ll); deflate_next_code_ll); deflate_build_alphabet(deflate_lld_lengths + 288, 29, deflate_build_alphabet(deflate_lld_lengths + 288, 29, deflate_bl_count_d, deflate_next_code_d); deflate_bl_count_d, deflate_next_code_d); return deflate_huffman(deflate_lld_lengths, 288, return deflate_huffman(deflate_lld_lengths, 288, deflate_lld_lengths + 288, 29); deflate_lld_lengths + 288, 29); #endif } } static int8_t deflate_dynamic_huffman() static int8_t deflate_dynamic_huffman() Loading @@ -336,16 +400,29 @@ static int8_t deflate_dynamic_huffman() deflate_hc_lengths[deflate_hclen_index[i]] = 0; deflate_hc_lengths[deflate_hclen_index[i]] = 0; } } #ifdef DEFLATE_WITH_LUT deflate_build_alphabet(deflate_hc_lengths, sizeof(deflate_hc_lengths), deflate_bl_count_ll, deflate_next_code_ll, deflate_ll_codes); #else deflate_build_alphabet(deflate_hc_lengths, deflate_build_alphabet(deflate_hc_lengths, sizeof(deflate_hc_lengths), sizeof(deflate_hc_lengths), deflate_bl_count_ll, deflate_next_code_ll); deflate_bl_count_ll, deflate_next_code_ll); #endif uint16_t items_processed = 0; uint16_t items_processed = 0; while (items_processed < hlit + hdist) { while (items_processed < hlit + hdist) { #ifdef DEFLATE_WITH_LUT uint8_t code = deflate_huff(deflate_ll_codes, deflate_bl_count_ll, deflate_next_code_ll); #else uint8_t code = uint8_t code = deflate_huff(deflate_hc_lengths, sizeof(deflate_hc_lengths), deflate_huff(deflate_hc_lengths, sizeof(deflate_hc_lengths), deflate_bl_count_ll, deflate_bl_count_ll, deflate_next_code_ll); deflate_next_code_ll); #endif if (code == 16) { if (code == 16) { uint8_t copy_count = 3 + deflate_get_bits(2); uint8_t copy_count = 3 + deflate_get_bits(2); for (uint8_t i = 0; i < copy_count; i++) { for (uint8_t i = 0; i < copy_count; i++) { Loading @@ -371,13 +448,22 @@ static int8_t deflate_dynamic_huffman() } } } } #ifdef DEFLATE_WITH_LUT deflate_build_alphabet(deflate_lld_lengths, hlit, deflate_bl_count_ll, deflate_next_code_ll, deflate_ll_codes); deflate_build_alphabet(deflate_lld_lengths + hlit, hdist, deflate_bl_count_d, deflate_next_code_d, deflate_d_codes); return deflate_huffman(deflate_ll_codes, deflate_d_codes); #else deflate_build_alphabet(deflate_lld_lengths, hlit, deflate_build_alphabet(deflate_lld_lengths, hlit, deflate_bl_count_ll, deflate_next_code_ll); deflate_bl_count_ll, deflate_next_code_ll); deflate_build_alphabet(deflate_lld_lengths + hlit, hdist, deflate_build_alphabet(deflate_lld_lengths + hlit, hdist, deflate_bl_count_d, deflate_next_code_d); deflate_bl_count_d, deflate_next_code_d); return deflate_huffman(deflate_lld_lengths, hlit, return deflate_huffman(deflate_lld_lengths, hlit, deflate_lld_lengths + hlit, hdist); deflate_lld_lengths + hlit, hdist); #endif } } int16_t inflate(unsigned char *input_buf, uint16_t input_len, int16_t inflate(unsigned char *input_buf, uint16_t input_len, Loading test/compile-c++11.sh +1 −1 Original line number Original line Diff line number Diff line #!/bin/sh #!/bin/sh exec g++ -std=c++11 -Wall -Wextra -pedantic -I../src -o inflate inflate-app.c ../src/inflate.c exec g++ -std=c++11 -O2 -Wall -Wextra -pedantic -I../src "$@" -o inflate inflate-app.c ../src/inflate.c test/compile-c++20.sh +1 −1 Original line number Original line Diff line number Diff line #!/bin/sh #!/bin/sh # g++ as provided by Debian Buster (used for CI tests) does not support c++20 # g++ as provided by Debian Buster (used for CI tests) does not support c++20 exec g++ -std=c++2a -Wall -Wextra -pedantic -I../src -o inflate inflate-app.c ../src/inflate.c exec g++ -std=c++2a -O2 -Wall -Wextra -pedantic -I../src "$@" -o inflate inflate-app.c ../src/inflate.c test/compile-c11.sh +1 −1 Original line number Original line Diff line number Diff line #!/bin/sh #!/bin/sh exec gcc -std=c11 -Wall -Wextra -pedantic -I../src -o inflate inflate-app.c ../src/inflate.c exec gcc -std=c11 -O2 -Wall -Wextra -pedantic -I../src "$@" -o inflate inflate-app.c ../src/inflate.c Loading
README.md +42 −27 Original line number Original line Diff line number Diff line **zlib-deflate-nostdlib** provides a zlib decompressor (RFC 1950) and deflate **zlib-deflate-nostdlib** provides a zlib decompressor (RFC 1950) and deflate reader (RFC 1951) suitable for 8- and 16-bit microcontrollers. It works reader (RFC 1951) suitable for 8- and 16-bit microcontrollers. It works fine on fine on MCUs as small as ATMega328P (used, for example, in the Arduino Nano) MCUs as small as ATMega328P (used, for example, in the Arduino Nano) and and MSP430FR5994. It is compatible with both C (from c99 on) and C++. Apart MSP430FR5994. It is compatible with both C (from c99 on) and C++. Apart from from type definitions for (u)int8\_t, (u)int16\_t, and (u)int32\_t, which are type definitions for (u)int8\_t, (u)int16\_t, and (u)int32\_t, which you can typically provided by stdint.h, it has no external dependencies. provide yourself if stdint.h is not available, it has no external dependencies. zlib-deflate-nostdlib is focused on a low memory footprint. It is not optimized zlib-deflate-nostdlib is focused on a low memory footprint and not on speed. for speed and uses a pretty naive implementation right now. Depending on architecture and compilation settings, it requires **1.6 to 2.6 kB of ROM** and **0.5 to 1.2 kB of RAM**. Decompression speed ranges from **1 to 5 kB/s per MHz**. See below for details and tunables. Note: This library *inflates* (i.e., decompresses) data. The source files and Note: This library *inflates* (i.e., decompresses) data. The source files and API are named as such, as is the corresponding function in the original zlib API are named as such, as is the corresponding function in the original zlib Loading Loading @@ -105,42 +107,55 @@ is designed for. In that case, you are probably better off with ## Memory Requirements ## Memory Requirements Excluding the decompressed data buffer, zlib-deflate-nostdlib needs about Compilation with `-Os`. ROM/RAM values are rounded up to the next multiple of 2.5 kB of ROM and 500 Bytes of RAM. Actual values depend on the architecture, 16B and do not include the buffer for decompressede data. see the tables below. ROM/RAM values are rounded up to the next multiple of 16B. ### default (no checksum verification) ### baseline (no checksum verification) | Architecture | ROM | RAM | | Architecture | ROM | RAM | | :--- | ---: | ---: | | :--- | ---: | ---: | | 8-bit ATMega328P | 1824 B | 640 B | | 8-bit ATMega328P | 1808 B | 640 B | | 16-bit MSP430FR5994 | 2272 B | 448 B | | 16-bit MSP430FR5994 | 2256 B | 448 B | | 20-bit MSP430FR5994 | 2576 B | 464 B | | 20-bit MSP430FR5994 | 2560 B | 464 B | | 32-bit ESP8266 | 1888 B | 656 B | | 32-bit ESP8266 | 1888 B | 656 B | | 32-bit STM32F446RE (ARM Cortex M3) | 1600 B | 464 B | | 32-bit STM32F446RE (ARM Cortex M3) | 1616 B | 464 B | ### compliant mode (-DDEFLATE\_CHECKSUM) ### compliant mode (-DDEFLATE\_CHECKSUM) ROM = baseline + 150 to 300 B, RAM = baseline. ### faster mode (-DDEFLATE\_WITH\_LUT) | Architecture | ROM | RAM | | Architecture | ROM | RAM | | :--- | ---: | ---: | | :--- | ---: | ---: | | 8-bit ATMega328P | 2032 B | 640 B | | 8-bit ATMega328P | — | — | | 16-bit MSP430FR5994 | 2560 B | 448 B | | 16-bit MSP430FR5994 | 2896 B | 1088 B | | 20-bit MSP430FR5994 | 2896 B | 464 B | | 20-bit MSP430FR5994 | 3248 B | 1088 B | | 32-bit ESP8266 | 2048 B | 656 B | | 32-bit ESP8266 | 1856 B | 1296 B | | 32-bit STM32F446RE (ARM Cortex M3) | 1782 B | 464 B | | 32-bit STM32F446RE (ARM Cortex M3) | 1664 B | 1104 B | ## Performance ## Performance Due to its focus on low RAM usage, zlib-deflate-nostdlib is very slow. Expect Tested with text files of various sizes, minimum file size 500 bytes, maximum about 1kB/s per MHz on 16-bit and 2kB/s per MHz on 32-bit architectures. Tested file size determined by the amount of available RAM. with text files of various sizes, minimum file size 500 bytes, maximum file size determined by the amount of available RAM. ### baseline (no checksum verification) | Architecture | Speed @ 1 MHz | Speed | CPU Clock | | Architecture | Speed @ 1 MHz | Speed | CPU Clock | | :--- | ---: | ---: | ---: | | :--- | ---: | ---: | ---: | | 8-bit ATMega328P | 1 kB/s | 10 .. 22 kB/s | 16 MHz | | 8-bit ATMega328P | 1 kB/s | 10 .. 22 kB/s | 16 MHz | | 16-bit MSP430FR5994 | 1 kB/s | 8..15 kB/s | 16 MHz | | 16-bit MSP430FR5994 | 1 kB/s | 8..16 kB/s | 16 MHz | | 20-bit MSP430FR5994 | 1 kB/s | 8..17 kB/s | 16 MHz | | 20-bit MSP430FR5994 | 1 kB/s | 8..16 kB/s | 16 MHz | | 32-bit ESP8266 | 1 .. 3 kB/s | 79..246 kB/s | 80 MHz | | 32-bit ESP8266 | 1 .. 3 kB/s | 79..246 kB/s | 80 MHz | | 32-bit STM32F446RE (ARM Cortex M3) | 1 .. 5 kB/s | 282..875 kB/s | 168 MHz | | 32-bit STM32F446RE (ARM Cortex M3) | 1 .. 5 kB/s | 282..875 kB/s | 168 MHz | ### faster mode (-DDEFLATE\_WITH\_LUT) | Architecture | Speed @ 1 MHz | Speed | CPU Clock | | :--- | ---: | ---: | ---: | | 8-bit ATMega328P | — | — | 16 MHz | | 16-bit MSP430FR5994 | 2 kB/s | 22..37 kB/s | 16 MHz | | 20-bit MSP430FR5994 | 2 kB/s | 20..34 kB/s | 16 MHz | | 32-bit ESP8266 | 3 .. 8 kB/s | 234..671 kB/s | 80 MHz | | 32-bit STM32F446RE (ARM Cortex M3) | 6 .. 17 kB/s | 986..2815 kB/s | 168 MHz |
src/inflate.c +89 −3 Original line number Original line Diff line number Diff line Loading @@ -92,6 +92,11 @@ uint8_t deflate_hc_lengths[19]; */ */ uint8_t deflate_lld_lengths[318]; uint8_t deflate_lld_lengths[318]; #ifdef DEFLATE_WITH_LUT uint16_t deflate_ll_codes[288]; uint16_t deflate_d_codes[30]; #endif /* /* * Bit length counts and next code entries for Literal/Length alphabet. * Bit length counts and next code entries for Literal/Length alphabet. * Combined with the code lengths in deflate_lld_lengths, these make up the * Combined with the code lengths in deflate_lld_lengths, these make up the Loading Loading @@ -159,8 +164,14 @@ static uint16_t deflate_get_bits(uint8_t num_bits) return ret & deflate_bitmask(num_bits); return ret & deflate_bitmask(num_bits); } } #ifdef DEFLATE_WITH_LUT static void deflate_build_alphabet(uint8_t * lengths, uint16_t size, uint8_t * bl_count, uint16_t * next_code, uint16_t * codes) #else static void deflate_build_alphabet(uint8_t * lengths, uint16_t size, static void deflate_build_alphabet(uint8_t * lengths, uint16_t size, uint8_t * bl_count, uint16_t * next_code) uint8_t * bl_count, uint16_t * next_code) #endif { { uint16_t i; uint16_t i; uint16_t code = 0; uint16_t code = 0; Loading @@ -178,12 +189,28 @@ static void deflate_build_alphabet(uint8_t * lengths, uint16_t size, } } } } for (i = 1; i < max_len + 1; i++) { for (i = 1; i <= max_len; i++) { code = (code + bl_count[i - 1]) << 1; code = (code + bl_count[i - 1]) << 1; next_code[i] = code; next_code[i] = code; } } #ifdef DEFLATE_WITH_LUT uint8_t j = 0; code = 0; for (j = 1; j <= max_len; j++) { for (i = 0; i < size; i++) { if (lengths[i] == j) { codes[code++] = i; } } } #endif } } #ifdef DEFLATE_WITH_LUT static uint16_t deflate_huff(uint16_t * codes, uint8_t * bl_count, uint16_t * next_code) #else /* /* * This function trades speed for low memory requirements. Instead of building * This function trades speed for low memory requirements. Instead of building * an actual huffman tree (at a cost of about 650 Bytes of RAM), we iterate * an actual huffman tree (at a cost of about 650 Bytes of RAM), we iterate Loading @@ -192,8 +219,12 @@ static void deflate_build_alphabet(uint8_t * lengths, uint16_t size, */ */ static uint16_t deflate_huff(uint8_t * lengths, uint16_t size, static uint16_t deflate_huff(uint8_t * lengths, uint16_t size, uint8_t * bl_count, uint16_t * next_code) uint8_t * bl_count, uint16_t * next_code) #endif { { uint16_t next_word = deflate_get_word(); uint16_t next_word = deflate_get_word(); #ifdef DEFLATE_WITH_LUT uint16_t code = 0; #endif for (uint8_t num_bits = 1; num_bits < 16; num_bits++) { for (uint8_t num_bits = 1; num_bits < 16; num_bits++) { uint16_t next_bits = deflate_rev_word(next_word, num_bits); uint16_t next_bits = deflate_rev_word(next_word, num_bits); if (bl_count[num_bits] && next_bits >= next_code[num_bits] if (bl_count[num_bits] && next_bits >= next_code[num_bits] Loading @@ -203,9 +234,11 @@ static uint16_t deflate_huff(uint8_t * lengths, uint16_t size, deflate_input_now++; deflate_input_now++; deflate_bit_offset -= 8; deflate_bit_offset -= 8; } } #ifdef DEFLATE_WITH_LUT return codes[code + (next_bits - next_code[num_bits])]; #else uint8_t len_pos = next_bits; uint8_t len_pos = next_bits; uint8_t cur_pos = next_code[num_bits]; uint8_t cur_pos = next_code[num_bits]; // This is slow, but memory-efficient for (uint16_t i = 0; i < size; i++) { for (uint16_t i = 0; i < size; i++) { if (lengths[i] == num_bits) { if (lengths[i] == num_bits) { if (cur_pos == len_pos) { if (cur_pos == len_pos) { Loading @@ -214,20 +247,35 @@ static uint16_t deflate_huff(uint8_t * lengths, uint16_t size, cur_pos++; cur_pos++; } } } } #endif } else { #ifdef DEFLATE_WITH_LUT code += bl_count[num_bits]; #endif } } } } return 65535; return 65535; } } #ifdef DEFLATE_WITH_LUT static int8_t deflate_huffman(uint16_t * ll_codes, uint16_t * d_codes) #else static int8_t deflate_huffman(uint8_t * ll_lengths, uint16_t ll_size, static int8_t deflate_huffman(uint8_t * ll_lengths, uint16_t ll_size, uint8_t * d_lengths, uint8_t d_size) uint8_t * d_lengths, uint8_t d_size) #endif { { uint16_t code; uint16_t code; uint16_t dcode; uint16_t dcode; while (1) { while (1) { #ifdef DEFLATE_WITH_LUT code = deflate_huff(ll_codes, deflate_bl_count_ll, deflate_next_code_ll); #else code = code = deflate_huff(ll_lengths, ll_size, deflate_bl_count_ll, deflate_huff(ll_lengths, ll_size, deflate_bl_count_ll, deflate_next_code_ll); deflate_next_code_ll); #endif if (code < 256) { if (code < 256) { if (deflate_output_now == deflate_output_end) { if (deflate_output_now == deflate_output_end) { return DEFLATE_ERR_OUTPUT_LENGTH; return DEFLATE_ERR_OUTPUT_LENGTH; Loading @@ -244,10 +292,17 @@ static int8_t deflate_huffman(uint8_t * ll_lengths, uint16_t ll_size, if (extra_bits) { if (extra_bits) { len_val += deflate_get_bits(extra_bits); len_val += deflate_get_bits(extra_bits); } } #ifdef DEFLATE_WITH_LUT dcode = deflate_huff(d_codes, deflate_bl_count_d, deflate_next_code_d); #else dcode = dcode = deflate_huff(d_lengths, d_size, deflate_huff(d_lengths, d_size, deflate_bl_count_d, deflate_bl_count_d, deflate_next_code_d); deflate_next_code_d); #endif uint16_t dist_val = deflate_distance_offsets[dcode]; uint16_t dist_val = deflate_distance_offsets[dcode]; extra_bits = deflate_distance_bits[dcode]; extra_bits = deflate_distance_bits[dcode]; if (extra_bits) { if (extra_bits) { Loading Loading @@ -313,12 +368,21 @@ static int8_t deflate_static_huffman() deflate_lld_lengths[i] = 5; deflate_lld_lengths[i] = 5; } } #ifdef DEFLATE_WITH_LUT deflate_build_alphabet(deflate_lld_lengths, 288, deflate_bl_count_ll, deflate_next_code_ll, deflate_ll_codes); deflate_build_alphabet(deflate_lld_lengths + 288, 29, deflate_bl_count_d, deflate_next_code_d, deflate_d_codes); return deflate_huffman(deflate_ll_codes, deflate_d_codes); #else deflate_build_alphabet(deflate_lld_lengths, 288, deflate_bl_count_ll, deflate_build_alphabet(deflate_lld_lengths, 288, deflate_bl_count_ll, deflate_next_code_ll); deflate_next_code_ll); deflate_build_alphabet(deflate_lld_lengths + 288, 29, deflate_build_alphabet(deflate_lld_lengths + 288, 29, deflate_bl_count_d, deflate_next_code_d); deflate_bl_count_d, deflate_next_code_d); return deflate_huffman(deflate_lld_lengths, 288, return deflate_huffman(deflate_lld_lengths, 288, deflate_lld_lengths + 288, 29); deflate_lld_lengths + 288, 29); #endif } } static int8_t deflate_dynamic_huffman() static int8_t deflate_dynamic_huffman() Loading @@ -336,16 +400,29 @@ static int8_t deflate_dynamic_huffman() deflate_hc_lengths[deflate_hclen_index[i]] = 0; deflate_hc_lengths[deflate_hclen_index[i]] = 0; } } #ifdef DEFLATE_WITH_LUT deflate_build_alphabet(deflate_hc_lengths, sizeof(deflate_hc_lengths), deflate_bl_count_ll, deflate_next_code_ll, deflate_ll_codes); #else deflate_build_alphabet(deflate_hc_lengths, deflate_build_alphabet(deflate_hc_lengths, sizeof(deflate_hc_lengths), sizeof(deflate_hc_lengths), deflate_bl_count_ll, deflate_next_code_ll); deflate_bl_count_ll, deflate_next_code_ll); #endif uint16_t items_processed = 0; uint16_t items_processed = 0; while (items_processed < hlit + hdist) { while (items_processed < hlit + hdist) { #ifdef DEFLATE_WITH_LUT uint8_t code = deflate_huff(deflate_ll_codes, deflate_bl_count_ll, deflate_next_code_ll); #else uint8_t code = uint8_t code = deflate_huff(deflate_hc_lengths, sizeof(deflate_hc_lengths), deflate_huff(deflate_hc_lengths, sizeof(deflate_hc_lengths), deflate_bl_count_ll, deflate_bl_count_ll, deflate_next_code_ll); deflate_next_code_ll); #endif if (code == 16) { if (code == 16) { uint8_t copy_count = 3 + deflate_get_bits(2); uint8_t copy_count = 3 + deflate_get_bits(2); for (uint8_t i = 0; i < copy_count; i++) { for (uint8_t i = 0; i < copy_count; i++) { Loading @@ -371,13 +448,22 @@ static int8_t deflate_dynamic_huffman() } } } } #ifdef DEFLATE_WITH_LUT deflate_build_alphabet(deflate_lld_lengths, hlit, deflate_bl_count_ll, deflate_next_code_ll, deflate_ll_codes); deflate_build_alphabet(deflate_lld_lengths + hlit, hdist, deflate_bl_count_d, deflate_next_code_d, deflate_d_codes); return deflate_huffman(deflate_ll_codes, deflate_d_codes); #else deflate_build_alphabet(deflate_lld_lengths, hlit, deflate_build_alphabet(deflate_lld_lengths, hlit, deflate_bl_count_ll, deflate_next_code_ll); deflate_bl_count_ll, deflate_next_code_ll); deflate_build_alphabet(deflate_lld_lengths + hlit, hdist, deflate_build_alphabet(deflate_lld_lengths + hlit, hdist, deflate_bl_count_d, deflate_next_code_d); deflate_bl_count_d, deflate_next_code_d); return deflate_huffman(deflate_lld_lengths, hlit, return deflate_huffman(deflate_lld_lengths, hlit, deflate_lld_lengths + hlit, hdist); deflate_lld_lengths + hlit, hdist); #endif } } int16_t inflate(unsigned char *input_buf, uint16_t input_len, int16_t inflate(unsigned char *input_buf, uint16_t input_len, Loading
test/compile-c++11.sh +1 −1 Original line number Original line Diff line number Diff line #!/bin/sh #!/bin/sh exec g++ -std=c++11 -Wall -Wextra -pedantic -I../src -o inflate inflate-app.c ../src/inflate.c exec g++ -std=c++11 -O2 -Wall -Wextra -pedantic -I../src "$@" -o inflate inflate-app.c ../src/inflate.c
test/compile-c++20.sh +1 −1 Original line number Original line Diff line number Diff line #!/bin/sh #!/bin/sh # g++ as provided by Debian Buster (used for CI tests) does not support c++20 # g++ as provided by Debian Buster (used for CI tests) does not support c++20 exec g++ -std=c++2a -Wall -Wextra -pedantic -I../src -o inflate inflate-app.c ../src/inflate.c exec g++ -std=c++2a -O2 -Wall -Wextra -pedantic -I../src "$@" -o inflate inflate-app.c ../src/inflate.c
test/compile-c11.sh +1 −1 Original line number Original line Diff line number Diff line #!/bin/sh #!/bin/sh exec gcc -std=c11 -Wall -Wextra -pedantic -I../src -o inflate inflate-app.c ../src/inflate.c exec gcc -std=c11 -O2 -Wall -Wextra -pedantic -I../src "$@" -o inflate inflate-app.c ../src/inflate.c