diff: memory exhausted
Yesterday, I tried to make a diff of two ~800 MB big files on a machine with 1 GB RAM – unfortunately, it didn’t work:
# diff -u full/2007-09-25-full.sql incr/incr.sql
diff: memory exhausted
After some googling and looking through various blogs and articles, it appeared to me that GNU diff just works like that – wants to load both files in memory. So, perhaps adding some more space would help?
# dd if=/dev/zero of=/swapfile bs=1024 count=5242800
# mkswap /swapfile
# swapon /swapfile
Indeed, it helped – I was able to make a diff of these two ~800 MB files. However, as soon as I tried to make a diff of ~1 GB files, diff again exited with “memory exhausted” message.
I tried a number of tools, but none of them was giving the output I wanted (similar to that of diff -u).
As I needed that diff for backup only, in the end, I use xdelta tool:
# xdelta delta -9 full/2007-09-25-full.sql incr/incr.sql 2007-09-25-sql.delta
It gives a binary delta file, but is able to produce it using much less memory than diff.
Ooh, and if your files are so big that xdelta is failing for you…
# xdelta delta -9 full/2008-06-01-full.sql incr/incr.sql 2008-06-02-incr.delta.temp
xdelta: mmap failed: Cannot allocate memory
…you may consider using xdelta3. The deltas it produces are a big bigger, but it works much better (read: does not fail) with large files. Syntax:
# xdelta3 -9 -f -e -s full/2008-06-01-full.sql incr/incr.sql 2008-06-02-incr.delta.temp
Still, it would be great to have a tool which could produce diff -u -style output for large files.
–speed-large-files?
No, –speed-large-files doesn’t help for big files (still “memory exhausted”).
Try bdiff: http://www.computerhope.com/unix/ubdiff.htm
Do you still need help with this?
If not, what did you decide on?
I too was surprised that –speed-large-files runs out of memory.
In the old days, it just used to run a program called diffh.
That seems to have gone, so I have written my own, which you are welcome to try.
-Ian
Yeah, it would be better to have a human-readable diff, and not a binary delta.
Do you have a link?
I could just email it to you.
-Ian
…but I can’t see your email address anywhere.
If you can see mine, drop me a note and I’ll reply with the source code (C).
-Ian
You can also ’split’ to break the two big files into smaller pieces, then diff each of the smaller pieces…
[...] post: http://blog.wpkg.org/2007/09/26/diff-memory-exhausted/ Share this:TwitterFacebookLike this:LikeBe the first to like this [...]