[Dev] Setting dash as default shell (or getting rid of bash)

Carsten Haitzler c.haitzler at samsung.com
Wed Oct 2 03:32:47 GMT 2013


On 10/02/2013 04:36 AM, Kok, Auke-jan H wrote:
> from my casual understanding of the btrfs code:
>
> The compression is done at the page level, and if the output is larger
> than the input, the compressed data is discarded and left
> uncompressed. This is done at the lzo/zlib level, not some file
> extension logic. The mount option (force-compress) just attempts to
> compress all writes, whereas if you omit it you will need to use xattr
> to enable compression per file/folder. even if you force compression,
> it won't compress things ever to be larger than they were.

the "if compressed data > original" is standard fare in compression. why
store 1002 bytes when you can just keep the original 980. :) what i'm
wondering is how does it AVOID the cpu overhead on writes of TRYING to
compress everything and then failing as above, by guessing based on file
magic signatures, filenames (use extension to guess) or something else. the

https://btrfs.wiki.kernel.org/index.php/Compression

does cover it, but only briefly. it says that it "tries to compress the first part of a file and if that fails (eg isn't smaller) then it aborts compressing any more of that file at all". so what is this "first part"? is it 4k, 512 bytes? 128k? is it strictly "if compressed size less than original" eg if it compressed 4096 bytes to 4095 - it keeps compressing" or does the margin have to EXCEED some factor (eg 4000 bytes down from 4096 is not good enough. you have to get to 3500 to have compression kick in). so the page is a bit vague on how it figures it out.

and then there comes the issue that... let's assume it's semi-smart and has some % threshold you have to hit - eg u have to get to < 80% of original size for header to begin to compress the rest... what happens to formats that have headers that compress well, but tails that just waste cpu time hand-over-fist to compress the rest? so a format that lets say has a 1000 byte header that is quite sparse (for speed) but everything else is tightly compressed already, the above algorithm will hit such files hard. possible options: try compress first segment, then if it does well, keep compressing rest, UNLESS some block/page/section doesn't compress well enough (close to 100% size or maybe bigger) and if it sees more than 2 or 3 of these "hard to compress" segments, it aborts compression entirely for the whole file, keeping the uncompressed copy of the header. another option: a "database" of magic numbers in file headers to indicate "do NOT compress me ever" or "ALWAYS compress me", or a database of file globs/extensions (faster/simpler than magic numbers)... at least in terms of automagic compression this may end up best. alternatively only compress the files you REALLY know will benefit (chattr them). you can make a tool that re-writes the file into a new file chattr'd to be compressed and then just go around picking and choosing your files. of course this then leads to the idea that maybe higher level libs should provide some kind of file io wrapper that may automatically write compressed files given hints from the app or rules based on filename etc... and then kernel doesnt need to do it. :)



>
> Auke
>
>
> On Tue, Oct 1, 2013 at 12:13 PM, Barbieri, Gustavo
> <gustavo.barbieri at intel.com> wrote:
>> I was told that BTRFS was smart to not compress formats that are compressed
>> already. Don’t know if this is a rollback if the compression rate is not
>> good or if there is actually some matching in the file header.
>>
>>
>>
>> --
>>
>> Gustavo Sverzut Barbieri
>>
>> Intel Open source Technology Center
>>
>>
>>
>> From: Carsten Haitzler [mailto:c.haitzler at samsung.com]
>> Sent: Tuesday, October 01, 2013 12:14 AM
>> To: myungjoo.ham at samsung.com
>> Cc: Barbieri, Gustavo; dev at lists.tizen.org; 정재훈
>>
>>
>> Subject: Re: [Dev] Setting dash as default shell (or getting rid of bash)
>>
>>
>>
>> i wonder how it figures if its "beneficial" :) hmm "if the first portion is
>> not smaller"... this could be a problem with some file formats.. is this
>> just "smaller" or "significantly smaller"? eg edj (eet) files have a header
>> and directory block - its never compressed for speed of lookups, but each
>> data segment in the file may (or may not be) compressed via any one of
>> several compression methods (xlib, lz4/lz4hc, jpeg for lossy images etc.)...
>> :) it'd be nice to be able to provide globs like "do not compress *.edj,
>> *.jpg, *.png .... etc." :)
>>
>> btw  lz4 or lz4hc might be an awesome addition to the compression algos
>> supported. it seems to be "in progress". :)
>>
>> On 10/01/2013 11:04 AM, MyungJoo Ham wrote:
>>
>> With BTRFS you may "compress everything forcibly" or "compress if seems
>> beneficial".
>>
>> You may use "+c" to force compression on a file, but it seems that you
>> cannot foce not to compress.
>>
>> https://btrfs.wiki.kernel.org/index.php/Compression
>>
>>
>>
>> Jaehoon (jh80.chung at samsung) may give more input; he has been experimenting
>> with BTRFS & F2FS for Tizen 2.2.
>>
>>
>>
>> ------- Original Message -------
>>
>> Sender : 하이츨러<c.haitzler at samsung.com> 수석/차세대Computing Lab(S/W센터)/삼성전자
>>
>> Date : 2013-10-01 10:51 (GMT+09:00)
>>
>> Title : Re: [Dev] Setting dash as default shell (or getting rid of bash)
>>
>>
>>
>> there is? cool. how do you enable compression? is it "compress everything by
>> default" or can you "cmod" or chattr etc. specific files to then indicate to
>> the fs to go compress them... ?
>>
>> On 10/01/2013 06:20 AM, Barbieri, Gustavo wrote:
>>
>> There is transparent decompression for btrfs including different methods
>> such as lzo or zlib.
>>
>> --
>>
>> Gustavo Sverzut Barbieri
>>
>> Intel Open source Technology Center
>>
>> From: dev-bounces at lists.tizen.org [mailto:dev-bounces at lists.tizen.org] On
>> Behalf Of Carsten Haitzler
>> Sent: Sunday, September 29, 2013 11:50 PM
>> To: dev at lists.tizen.org
>> Subject: Re: [Dev] Setting dash as default shell (or getting rid of bash)
>>
>> On 09/28/2013 02:33 AM, Thiago Macieira wrote:
>>
>> On sexta-feira, 27 de setembro de 2013 12:09:17, Leandro Pereira wrote:
>>
>> If you're concerned about binary size, you could use a binary packer
>>
>> such as UPX. A quick test here, compressing my own shell (on a x86-64
>>
>> machine) yields pretty good compression. UPX decompresses really quickly
>>
>> so it's very unlikely it'll become a bottleneck:
>>
>>
>>
>> Binary compression is a trade-off between disk space usage and RAM usage.
>>
>> Depending on the system, it might also be a security risk.
>>
>>
>>
>> That's because the decompressor needs writable pages to decompress the image
>>
>> to, then mark it as executable. Hopefully, the decompressor moves from RW to
>>
>> RX instead of RWX, but you need to check that. And in any case, since those
>>
>> pages aren't backed by actual files on disk, the kernel must use the swap if
>> it
>>
>> needs to discard the pages, even the unused pages.
>>
>>
>> if anything this would be best done with filesystem native compression (ala
>> cramfs) and then the pages are at least backed and shared. :) short-term
>> this only helps if we move to use cramfs, but longer-term it may be a valid
>> "todo list" item to look at adding transparent decompression to
>> btrfs/f2fs/ext4 etc.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>>
>> Dev mailing list
>>
>> Dev at lists.tizen.org
>>
>> https://lists.tizen.org/listinfo/dev
>>
>>
>>
>>
>> --
>>
>> The above message is intended solely for the named addressee and may
>>
>> contain trade secret, industrial technology or privileged and
>>
>> confidential information otherwise protected under applicable law
>>
>> including the Unfair Competition Prevention and Trade Secret Protection
>>
>> Act. Any unauthorized dissemination, distribution, copying or use of the
>>
>> information contained in this communication is strictly prohibited. If
>>
>> you have received this communication in error, please notify the sender
>>
>> by email and delete this communication immediately.
>>
>>
>>
>> --
>>
>> The above message is intended solely for the named addressee and may
>>
>> contain trade secret, industrial technology or privileged and
>>
>> confidential information otherwise protected under applicable law
>>
>> including the Unfair Competition Prevention and Trade Secret Protection
>>
>> Act. Any unauthorized dissemination, distribution, copying or use of the
>>
>> information contained in this communication is strictly prohibited. If
>>
>> you have received this communication in error, please notify the sender
>>
>> by email and delete this communication immediately.
>>
>>
>>
>>
>>
>> --
>>
>> MyungJoo Ham (함명주), PHD
>>
>> System S/W Lab, S/W Platform Team, Software Center
>> Samsung Electronics
>> Cell: +82-10-6714-2858
>>
>>
>>
>>
>>
>> --
>>
>> The above message is intended solely for the named addressee and may
>>
>> contain trade secret, industrial technology or privileged and
>>
>> confidential information otherwise protected under applicable law
>>
>> including the Unfair Competition Prevention and Trade Secret Protection
>>
>> Act. Any unauthorized dissemination, distribution, copying or use of the
>>
>> information contained in this communication is strictly prohibited. If
>>
>> you have received this communication in error, please notify the sender
>>
>> by email and delete this communication immediately.
>>
>>
>> _______________________________________________
>> Dev mailing list
>> Dev at lists.tizen.org
>> https://lists.tizen.org/listinfo/dev
>>

-- 
The above message is intended solely for the named addressee and may
contain trade secret, industrial technology or privileged and
confidential information otherwise protected under applicable law
including the Unfair Competition Prevention and Trade Secret Protection
Act. Any unauthorized dissemination, distribution, copying or use of the
information contained in this communication is strictly prohibited. If
you have received this communication in error, please notify the sender
by email and delete this communication immediately.




More information about the Dev mailing list