Add faster mode with huffman -> code look-up table (8198be8f) · Commits · derf / zlib-deflate-nostdlib

README.md

+42 −27

Original line number	Original line	Diff line number	Diff line
	zlib-deflate-nostdlib provides a zlib decompressor (RFC 1950) and deflate		zlib-deflate-nostdlib provides a zlib decompressor (RFC 1950) and deflate
	reader (RFC 1951) suitable for 8- and 16-bit microcontrollers. It works		reader (RFC 1951) suitable for 8- and 16-bit microcontrollers. It works fine on
	fine on MCUs as small as ATMega328P (used, for example, in the Arduino Nano)		MCUs as small as ATMega328P (used, for example, in the Arduino Nano) and
	and MSP430FR5994. It is compatible with both C (from c99 on) and C++. Apart		MSP430FR5994. It is compatible with both C (from c99 on) and C++. Apart from
	from type definitions for (u)int8\_t, (u)int16\_t, and (u)int32\_t, which are		type definitions for (u)int8\_t, (u)int16\_t, and (u)int32\_t, which you can
	typically provided by stdint.h, it has no external dependencies.		provide yourself if stdint.h is not available, it has no external dependencies.

	zlib-deflate-nostdlib is focused on a low memory footprint. It is not optimized		zlib-deflate-nostdlib is focused on a low memory footprint and not on speed.
	for speed and uses a pretty naive implementation right now.		Depending on architecture and compilation settings, it requires **1.6 to 2.6 kB
			of ROM and 0.5 to 1.2 kB of RAM. Decompression speed ranges from 1 to 5
			kB/s per MHz**. See below for details and tunables.

	Note: This library inflates (i.e., decompresses) data. The source files and		Note: This library inflates (i.e., decompresses) data. The source files and
	API are named as such, as is the corresponding function in the original zlib		API are named as such, as is the corresponding function in the original zlib
	@@ -105,42 +107,55 @@ is designed for. In that case, you are probably better off with

	## Memory Requirements		## Memory Requirements

	Excluding the decompressed data buffer, zlib-deflate-nostdlib needs about		Compilation with `-Os`. ROM/RAM values are rounded up to the next multiple of
	2.5 kB of ROM and 500 Bytes of RAM. Actual values depend on the architecture,		16B and do not include the buffer for decompressede data.
	see the tables below. ROM/RAM values are rounded up to the next multiple of
	16B.

	### default (no checksum verification)		### baseline (no checksum verification)

	\| Architecture \| ROM \| RAM \|		\| Architecture \| ROM \| RAM \|
	\| :--- \| ---: \| ---: \|		\| :--- \| ---: \| ---: \|
	\| 8-bit ATMega328P \| 1824 B \| 640 B \|		\| 8-bit ATMega328P \| 1808 B \| 640 B \|
	\| 16-bit MSP430FR5994 \| 2272 B \| 448 B \|		\| 16-bit MSP430FR5994 \| 2256 B \| 448 B \|
	\| 20-bit MSP430FR5994 \| 2576 B \| 464 B \|		\| 20-bit MSP430FR5994 \| 2560 B \| 464 B \|
	\| 32-bit ESP8266 \| 1888 B \| 656 B \|		\| 32-bit ESP8266 \| 1888 B \| 656 B \|
	\| 32-bit STM32F446RE (ARM Cortex M3) \| 1600 B \| 464 B \|		\| 32-bit STM32F446RE (ARM Cortex M3) \| 1616 B \| 464 B \|

	### compliant mode (-DDEFLATE\_CHECKSUM)		### compliant mode (-DDEFLATE\_CHECKSUM)

			ROM = baseline + 150 to 300 B, RAM = baseline.

			### faster mode (-DDEFLATE\_WITH\_LUT)

	\| Architecture \| ROM \| RAM \|		\| Architecture \| ROM \| RAM \|
	\| :--- \| ---: \| ---: \|		\| :--- \| ---: \| ---: \|
	\| 8-bit ATMega328P \| 2032 B \| 640 B \|		\| 8-bit ATMega328P \| — \| — \|
	\| 16-bit MSP430FR5994 \| 2560 B \| 448 B \|		\| 16-bit MSP430FR5994 \| 2896 B \| 1088 B \|
	\| 20-bit MSP430FR5994 \| 2896 B \| 464 B \|		\| 20-bit MSP430FR5994 \| 3248 B \| 1088 B \|
	\| 32-bit ESP8266 \| 2048 B \| 656 B \|		\| 32-bit ESP8266 \| 1856 B \| 1296 B \|
	\| 32-bit STM32F446RE (ARM Cortex M3) \| 1782 B \| 464 B \|		\| 32-bit STM32F446RE (ARM Cortex M3) \| 1664 B \| 1104 B \|


	## Performance		## Performance

	Due to its focus on low RAM usage, zlib-deflate-nostdlib is very slow. Expect		Tested with text files of various sizes, minimum file size 500 bytes, maximum
	about 1kB/s per MHz on 16-bit and 2kB/s per MHz on 32-bit architectures. Tested		file size determined by the amount of available RAM.
	with text files of various sizes, minimum file size 500 bytes, maximum file
	size determined by the amount of available RAM.		### baseline (no checksum verification)

	\| Architecture \| Speed @ 1 MHz \| Speed \| CPU Clock \|		\| Architecture \| Speed @ 1 MHz \| Speed \| CPU Clock \|
	\| :--- \| ---: \| ---: \| ---: \|		\| :--- \| ---: \| ---: \| ---: \|
	\| 8-bit ATMega328P \| 1 kB/s \| 10 .. 22 kB/s \| 16 MHz \|		\| 8-bit ATMega328P \| 1 kB/s \| 10 .. 22 kB/s \| 16 MHz \|
	\| 16-bit MSP430FR5994 \| 1 kB/s \| 8..15 kB/s \| 16 MHz \|		\| 16-bit MSP430FR5994 \| 1 kB/s \| 8..16 kB/s \| 16 MHz \|
	\| 20-bit MSP430FR5994 \| 1 kB/s \| 8..17 kB/s \| 16 MHz \|		\| 20-bit MSP430FR5994 \| 1 kB/s \| 8..16 kB/s \| 16 MHz \|
	\| 32-bit ESP8266 \| 1 .. 3 kB/s \| 79..246 kB/s \| 80 MHz \|		\| 32-bit ESP8266 \| 1 .. 3 kB/s \| 79..246 kB/s \| 80 MHz \|
	\| 32-bit STM32F446RE (ARM Cortex M3) \| 1 .. 5 kB/s \| 282..875 kB/s \| 168 MHz \|		\| 32-bit STM32F446RE (ARM Cortex M3) \| 1 .. 5 kB/s \| 282..875 kB/s \| 168 MHz \|

			### faster mode (-DDEFLATE\_WITH\_LUT)

			\| Architecture \| Speed @ 1 MHz \| Speed \| CPU Clock \|
			\| :--- \| ---: \| ---: \| ---: \|
			\| 8-bit ATMega328P \| — \| — \| 16 MHz \|
			\| 16-bit MSP430FR5994 \| 2 kB/s \| 22..37 kB/s \| 16 MHz \|
			\| 20-bit MSP430FR5994 \| 2 kB/s \| 20..34 kB/s \| 16 MHz \|
			\| 32-bit ESP8266 \| 3 .. 8 kB/s \| 234..671 kB/s \| 80 MHz \|
			\| 32-bit STM32F446RE (ARM Cortex M3) \| 6 .. 17 kB/s \| 986..2815 kB/s \| 168 MHz \|

src/inflate.c

+89 −3

Original line number	Original line	Diff line number	Diff line
	@@ -92,6 +92,11 @@ uint8_t deflate_hc_lengths[19];
	*/		*/
	uint8_t deflate_lld_lengths[318];		uint8_t deflate_lld_lengths[318];

			#ifdef DEFLATE_WITH_LUT
			uint16_t deflate_ll_codes[288];
			uint16_t deflate_d_codes[30];
			#endif

	/*		/*
	* Bit length counts and next code entries for Literal/Length alphabet.		* Bit length counts and next code entries for Literal/Length alphabet.
	* Combined with the code lengths in deflate_lld_lengths, these make up the		* Combined with the code lengths in deflate_lld_lengths, these make up the
	@@ -159,8 +164,14 @@ static uint16_t deflate_get_bits(uint8_t num_bits)
	return ret & deflate_bitmask(num_bits);		return ret & deflate_bitmask(num_bits);
	}		}

			#ifdef DEFLATE_WITH_LUT
			static void deflate_build_alphabet(uint8_t * lengths, uint16_t size,
			uint8_t * bl_count, uint16_t * next_code,
			uint16_t * codes)
			#else
	static void deflate_build_alphabet(uint8_t * lengths, uint16_t size,		static void deflate_build_alphabet(uint8_t * lengths, uint16_t size,
	uint8_t * bl_count, uint16_t * next_code)		uint8_t * bl_count, uint16_t * next_code)
			#endif
	{		{
	uint16_t i;		uint16_t i;
	uint16_t code = 0;		uint16_t code = 0;
	@@ -178,12 +189,28 @@ static void deflate_build_alphabet(uint8_t * lengths, uint16_t size,
	}		}
	}		}

	for (i = 1; i < max_len + 1; i++) {		for (i = 1; i <= max_len; i++) {
	code = (code + bl_count[i - 1]) << 1;		code = (code + bl_count[i - 1]) << 1;
	next_code[i] = code;		next_code[i] = code;
	}		}

			#ifdef DEFLATE_WITH_LUT
			uint8_t j = 0;
			code = 0;
			for (j = 1; j <= max_len; j++) {
			for (i = 0; i < size; i++) {
			if (lengths[i] == j) {
			codes[code++] = i;
			}
			}
			}
			#endif
	}		}

			#ifdef DEFLATE_WITH_LUT
			static uint16_t deflate_huff(uint16_t * codes,
			uint8_t * bl_count, uint16_t * next_code)
			#else
	/*		/*
	* This function trades speed for low memory requirements. Instead of building		* This function trades speed for low memory requirements. Instead of building
	* an actual huffman tree (at a cost of about 650 Bytes of RAM), we iterate		* an actual huffman tree (at a cost of about 650 Bytes of RAM), we iterate
	@@ -192,8 +219,12 @@ static void deflate_build_alphabet(uint8_t * lengths, uint16_t size,
	*/		*/
	static uint16_t deflate_huff(uint8_t * lengths, uint16_t size,		static uint16_t deflate_huff(uint8_t * lengths, uint16_t size,
	uint8_t * bl_count, uint16_t * next_code)		uint8_t * bl_count, uint16_t * next_code)
			#endif
	{		{
	uint16_t next_word = deflate_get_word();		uint16_t next_word = deflate_get_word();
			#ifdef DEFLATE_WITH_LUT
			uint16_t code = 0;
			#endif
	for (uint8_t num_bits = 1; num_bits < 16; num_bits++) {		for (uint8_t num_bits = 1; num_bits < 16; num_bits++) {
	uint16_t next_bits = deflate_rev_word(next_word, num_bits);		uint16_t next_bits = deflate_rev_word(next_word, num_bits);
	if (bl_count[num_bits] && next_bits >= next_code[num_bits]		if (bl_count[num_bits] && next_bits >= next_code[num_bits]
	@@ -203,9 +234,11 @@ static uint16_t deflate_huff(uint8_t * lengths, uint16_t size,
	deflate_input_now++;		deflate_input_now++;
	deflate_bit_offset -= 8;		deflate_bit_offset -= 8;
	}		}
			#ifdef DEFLATE_WITH_LUT
			return codes[code + (next_bits - next_code[num_bits])];
			#else
	uint8_t len_pos = next_bits;		uint8_t len_pos = next_bits;
	uint8_t cur_pos = next_code[num_bits];		uint8_t cur_pos = next_code[num_bits];
	// This is slow, but memory-efficient
	for (uint16_t i = 0; i < size; i++) {		for (uint16_t i = 0; i < size; i++) {
	if (lengths[i] == num_bits) {		if (lengths[i] == num_bits) {
	if (cur_pos == len_pos) {		if (cur_pos == len_pos) {
	@@ -214,20 +247,35 @@ static uint16_t deflate_huff(uint8_t * lengths, uint16_t size,
	cur_pos++;		cur_pos++;
	}		}
	}		}
			#endif
			} else {
			#ifdef DEFLATE_WITH_LUT
			code += bl_count[num_bits];
			#endif
	}		}
	}		}
	return 65535;		return 65535;
	}		}

			#ifdef DEFLATE_WITH_LUT
			static int8_t deflate_huffman(uint16_t * ll_codes, uint16_t * d_codes)
			#else
	static int8_t deflate_huffman(uint8_t * ll_lengths, uint16_t ll_size,		static int8_t deflate_huffman(uint8_t * ll_lengths, uint16_t ll_size,
	uint8_t * d_lengths, uint8_t d_size)		uint8_t * d_lengths, uint8_t d_size)
			#endif
	{		{
	uint16_t code;		uint16_t code;
	uint16_t dcode;		uint16_t dcode;
	while (1) {		while (1) {
			#ifdef DEFLATE_WITH_LUT
			code =
			deflate_huff(ll_codes, deflate_bl_count_ll,
			deflate_next_code_ll);
			#else
	code =		code =
	deflate_huff(ll_lengths, ll_size, deflate_bl_count_ll,		deflate_huff(ll_lengths, ll_size, deflate_bl_count_ll,
	deflate_next_code_ll);		deflate_next_code_ll);
			#endif
	if (code < 256) {		if (code < 256) {
	if (deflate_output_now == deflate_output_end) {		if (deflate_output_now == deflate_output_end) {
	return DEFLATE_ERR_OUTPUT_LENGTH;		return DEFLATE_ERR_OUTPUT_LENGTH;
	@@ -244,10 +292,17 @@ static int8_t deflate_huffman(uint8_t * ll_lengths, uint16_t ll_size,
	if (extra_bits) {		if (extra_bits) {
	len_val += deflate_get_bits(extra_bits);		len_val += deflate_get_bits(extra_bits);
	}		}
			#ifdef DEFLATE_WITH_LUT
			dcode =
			deflate_huff(d_codes,
			deflate_bl_count_d,
			deflate_next_code_d);
			#else
	dcode =		dcode =
	deflate_huff(d_lengths, d_size,		deflate_huff(d_lengths, d_size,
	deflate_bl_count_d,		deflate_bl_count_d,
	deflate_next_code_d);		deflate_next_code_d);
			#endif
	uint16_t dist_val = deflate_distance_offsets[dcode];		uint16_t dist_val = deflate_distance_offsets[dcode];
	extra_bits = deflate_distance_bits[dcode];		extra_bits = deflate_distance_bits[dcode];
	if (extra_bits) {		if (extra_bits) {
	@@ -313,12 +368,21 @@ static int8_t deflate_static_huffman()
	deflate_lld_lengths[i] = 5;		deflate_lld_lengths[i] = 5;
	}		}

			#ifdef DEFLATE_WITH_LUT
			deflate_build_alphabet(deflate_lld_lengths, 288, deflate_bl_count_ll,
			deflate_next_code_ll, deflate_ll_codes);
			deflate_build_alphabet(deflate_lld_lengths + 288, 29,
			deflate_bl_count_d, deflate_next_code_d,
			deflate_d_codes);
			return deflate_huffman(deflate_ll_codes, deflate_d_codes);
			#else
	deflate_build_alphabet(deflate_lld_lengths, 288, deflate_bl_count_ll,		deflate_build_alphabet(deflate_lld_lengths, 288, deflate_bl_count_ll,
	deflate_next_code_ll);		deflate_next_code_ll);
	deflate_build_alphabet(deflate_lld_lengths + 288, 29,		deflate_build_alphabet(deflate_lld_lengths + 288, 29,
	deflate_bl_count_d, deflate_next_code_d);		deflate_bl_count_d, deflate_next_code_d);
	return deflate_huffman(deflate_lld_lengths, 288,		return deflate_huffman(deflate_lld_lengths, 288,
	deflate_lld_lengths + 288, 29);		deflate_lld_lengths + 288, 29);
			#endif
	}		}

	static int8_t deflate_dynamic_huffman()		static int8_t deflate_dynamic_huffman()
	@@ -336,16 +400,29 @@ static int8_t deflate_dynamic_huffman()
	deflate_hc_lengths[deflate_hclen_index[i]] = 0;		deflate_hc_lengths[deflate_hclen_index[i]] = 0;
	}		}

			#ifdef DEFLATE_WITH_LUT
			deflate_build_alphabet(deflate_hc_lengths,
			sizeof(deflate_hc_lengths),
			deflate_bl_count_ll, deflate_next_code_ll,
			deflate_ll_codes);
			#else
	deflate_build_alphabet(deflate_hc_lengths,		deflate_build_alphabet(deflate_hc_lengths,
	sizeof(deflate_hc_lengths),		sizeof(deflate_hc_lengths),
	deflate_bl_count_ll, deflate_next_code_ll);		deflate_bl_count_ll, deflate_next_code_ll);
			#endif

	uint16_t items_processed = 0;		uint16_t items_processed = 0;
	while (items_processed < hlit + hdist) {		while (items_processed < hlit + hdist) {
			#ifdef DEFLATE_WITH_LUT
			uint8_t code = deflate_huff(deflate_ll_codes,
			deflate_bl_count_ll,
			deflate_next_code_ll);
			#else
	uint8_t code =		uint8_t code =
	deflate_huff(deflate_hc_lengths, sizeof(deflate_hc_lengths),		deflate_huff(deflate_hc_lengths, sizeof(deflate_hc_lengths),
	deflate_bl_count_ll,		deflate_bl_count_ll,
	deflate_next_code_ll);		deflate_next_code_ll);
			#endif
	if (code == 16) {		if (code == 16) {
	uint8_t copy_count = 3 + deflate_get_bits(2);		uint8_t copy_count = 3 + deflate_get_bits(2);
	for (uint8_t i = 0; i < copy_count; i++) {		for (uint8_t i = 0; i < copy_count; i++) {
	@@ -371,13 +448,22 @@ static int8_t deflate_dynamic_huffman()
	}		}
	}		}

			#ifdef DEFLATE_WITH_LUT
			deflate_build_alphabet(deflate_lld_lengths, hlit,
			deflate_bl_count_ll, deflate_next_code_ll,
			deflate_ll_codes);
			deflate_build_alphabet(deflate_lld_lengths + hlit, hdist,
			deflate_bl_count_d, deflate_next_code_d,
			deflate_d_codes);
			return deflate_huffman(deflate_ll_codes, deflate_d_codes);
			#else
	deflate_build_alphabet(deflate_lld_lengths, hlit,		deflate_build_alphabet(deflate_lld_lengths, hlit,
	deflate_bl_count_ll, deflate_next_code_ll);		deflate_bl_count_ll, deflate_next_code_ll);
	deflate_build_alphabet(deflate_lld_lengths + hlit, hdist,		deflate_build_alphabet(deflate_lld_lengths + hlit, hdist,
	deflate_bl_count_d, deflate_next_code_d);		deflate_bl_count_d, deflate_next_code_d);

	return deflate_huffman(deflate_lld_lengths, hlit,		return deflate_huffman(deflate_lld_lengths, hlit,
	deflate_lld_lengths + hlit, hdist);		deflate_lld_lengths + hlit, hdist);
			#endif
	}		}

	int16_t inflate(unsigned char *input_buf, uint16_t input_len,		int16_t inflate(unsigned char *input_buf, uint16_t input_len,

test/compile-c++11.sh

+1 −1

#!/bin/sh

#!/bin/sh

exec g++ -std=c++11 -Wall -Wextra -pedantic -I../src -o inflate inflate-app.c ../src/inflate.c

exec g++ -std=c++11 -O2 -Wall -Wextra -pedantic -I../src "$@" -o inflate inflate-app.c ../src/inflate.c

test/compile-c++20.sh

+1 −1

Original line number	Original line	Diff line number	Diff line
	#!/bin/sh		#!/bin/sh

	# g++ as provided by Debian Buster (used for CI tests) does not support c++20		# g++ as provided by Debian Buster (used for CI tests) does not support c++20
	exec g++ -std=c++2a -Wall -Wextra -pedantic -I../src -o inflate inflate-app.c ../src/inflate.c		exec g++ -std=c++2a -O2 -Wall -Wextra -pedantic -I../src "$@" -o inflate inflate-app.c ../src/inflate.c

test/compile-c11.sh

+1 −1

#!/bin/sh

#!/bin/sh

exec gcc -std=c11 -Wall -Wextra -pedantic -I../src -o inflate inflate-app.c ../src/inflate.c

exec gcc -std=c11 -O2 -Wall -Wextra -pedantic -I../src "$@" -o inflate inflate-app.c ../src/inflate.c

Original line number	Original line	Diff line number	Diff line
	zlib-deflate-nostdlib provides a zlib decompressor (RFC 1950) and deflate		zlib-deflate-nostdlib provides a zlib decompressor (RFC 1950) and deflate
	reader (RFC 1951) suitable for 8- and 16-bit microcontrollers. It works		reader (RFC 1951) suitable for 8- and 16-bit microcontrollers. It works fine on
	fine on MCUs as small as ATMega328P (used, for example, in the Arduino Nano)		MCUs as small as ATMega328P (used, for example, in the Arduino Nano) and
	and MSP430FR5994. It is compatible with both C (from c99 on) and C++. Apart		MSP430FR5994. It is compatible with both C (from c99 on) and C++. Apart from
	from type definitions for (u)int8\_t, (u)int16\_t, and (u)int32\_t, which are		type definitions for (u)int8\_t, (u)int16\_t, and (u)int32\_t, which you can
	typically provided by stdint.h, it has no external dependencies.		provide yourself if stdint.h is not available, it has no external dependencies.

	zlib-deflate-nostdlib is focused on a low memory footprint. It is not optimized		zlib-deflate-nostdlib is focused on a low memory footprint and not on speed.
	for speed and uses a pretty naive implementation right now.		Depending on architecture and compilation settings, it requires **1.6 to 2.6 kB
			of ROM and 0.5 to 1.2 kB of RAM. Decompression speed ranges from 1 to 5
			kB/s per MHz**. See below for details and tunables.

	Note: This library inflates (i.e., decompresses) data. The source files and		Note: This library inflates (i.e., decompresses) data. The source files and
	API are named as such, as is the corresponding function in the original zlib		API are named as such, as is the corresponding function in the original zlib
	@@ -105,42 +107,55 @@ is designed for. In that case, you are probably better off with

	## Memory Requirements		## Memory Requirements

	Excluding the decompressed data buffer, zlib-deflate-nostdlib needs about		Compilation with `-Os`. ROM/RAM values are rounded up to the next multiple of
	2.5 kB of ROM and 500 Bytes of RAM. Actual values depend on the architecture,		16B and do not include the buffer for decompressede data.
	see the tables below. ROM/RAM values are rounded up to the next multiple of
	16B.

	### default (no checksum verification)		### baseline (no checksum verification)

	\| Architecture \| ROM \| RAM \|		\| Architecture \| ROM \| RAM \|
	\| :--- \| ---: \| ---: \|		\| :--- \| ---: \| ---: \|
	\| 8-bit ATMega328P \| 1824 B \| 640 B \|		\| 8-bit ATMega328P \| 1808 B \| 640 B \|
	\| 16-bit MSP430FR5994 \| 2272 B \| 448 B \|		\| 16-bit MSP430FR5994 \| 2256 B \| 448 B \|
	\| 20-bit MSP430FR5994 \| 2576 B \| 464 B \|		\| 20-bit MSP430FR5994 \| 2560 B \| 464 B \|
	\| 32-bit ESP8266 \| 1888 B \| 656 B \|		\| 32-bit ESP8266 \| 1888 B \| 656 B \|
	\| 32-bit STM32F446RE (ARM Cortex M3) \| 1600 B \| 464 B \|		\| 32-bit STM32F446RE (ARM Cortex M3) \| 1616 B \| 464 B \|

	### compliant mode (-DDEFLATE\_CHECKSUM)		### compliant mode (-DDEFLATE\_CHECKSUM)

			ROM = baseline + 150 to 300 B, RAM = baseline.

			### faster mode (-DDEFLATE\_WITH\_LUT)

	\| Architecture \| ROM \| RAM \|		\| Architecture \| ROM \| RAM \|
	\| :--- \| ---: \| ---: \|		\| :--- \| ---: \| ---: \|
	\| 8-bit ATMega328P \| 2032 B \| 640 B \|		\| 8-bit ATMega328P \| — \| — \|
	\| 16-bit MSP430FR5994 \| 2560 B \| 448 B \|		\| 16-bit MSP430FR5994 \| 2896 B \| 1088 B \|
	\| 20-bit MSP430FR5994 \| 2896 B \| 464 B \|		\| 20-bit MSP430FR5994 \| 3248 B \| 1088 B \|
	\| 32-bit ESP8266 \| 2048 B \| 656 B \|		\| 32-bit ESP8266 \| 1856 B \| 1296 B \|
	\| 32-bit STM32F446RE (ARM Cortex M3) \| 1782 B \| 464 B \|		\| 32-bit STM32F446RE (ARM Cortex M3) \| 1664 B \| 1104 B \|


	## Performance		## Performance

	Due to its focus on low RAM usage, zlib-deflate-nostdlib is very slow. Expect		Tested with text files of various sizes, minimum file size 500 bytes, maximum
	about 1kB/s per MHz on 16-bit and 2kB/s per MHz on 32-bit architectures. Tested		file size determined by the amount of available RAM.
	with text files of various sizes, minimum file size 500 bytes, maximum file
	size determined by the amount of available RAM.		### baseline (no checksum verification)

	\| Architecture \| Speed @ 1 MHz \| Speed \| CPU Clock \|		\| Architecture \| Speed @ 1 MHz \| Speed \| CPU Clock \|
	\| :--- \| ---: \| ---: \| ---: \|		\| :--- \| ---: \| ---: \| ---: \|
	\| 8-bit ATMega328P \| 1 kB/s \| 10 .. 22 kB/s \| 16 MHz \|		\| 8-bit ATMega328P \| 1 kB/s \| 10 .. 22 kB/s \| 16 MHz \|
	\| 16-bit MSP430FR5994 \| 1 kB/s \| 8..15 kB/s \| 16 MHz \|		\| 16-bit MSP430FR5994 \| 1 kB/s \| 8..16 kB/s \| 16 MHz \|
	\| 20-bit MSP430FR5994 \| 1 kB/s \| 8..17 kB/s \| 16 MHz \|		\| 20-bit MSP430FR5994 \| 1 kB/s \| 8..16 kB/s \| 16 MHz \|
	\| 32-bit ESP8266 \| 1 .. 3 kB/s \| 79..246 kB/s \| 80 MHz \|		\| 32-bit ESP8266 \| 1 .. 3 kB/s \| 79..246 kB/s \| 80 MHz \|
	\| 32-bit STM32F446RE (ARM Cortex M3) \| 1 .. 5 kB/s \| 282..875 kB/s \| 168 MHz \|		\| 32-bit STM32F446RE (ARM Cortex M3) \| 1 .. 5 kB/s \| 282..875 kB/s \| 168 MHz \|

			### faster mode (-DDEFLATE\_WITH\_LUT)

			\| Architecture \| Speed @ 1 MHz \| Speed \| CPU Clock \|
			\| :--- \| ---: \| ---: \| ---: \|
			\| 8-bit ATMega328P \| — \| — \| 16 MHz \|
			\| 16-bit MSP430FR5994 \| 2 kB/s \| 22..37 kB/s \| 16 MHz \|
			\| 20-bit MSP430FR5994 \| 2 kB/s \| 20..34 kB/s \| 16 MHz \|
			\| 32-bit ESP8266 \| 3 .. 8 kB/s \| 234..671 kB/s \| 80 MHz \|
			\| 32-bit STM32F446RE (ARM Cortex M3) \| 6 .. 17 kB/s \| 986..2815 kB/s \| 168 MHz \|

Admin message