How to Create a Hex Dump of any File with Python


22 September 2017


Bing Art

Today we will share a quick python script for creating a hex dump of any file in Python. It can be pretty useful sometimes when we want to drill down into the low level data of a file - perhaps when we would like to analyse the file header of a piece of malware or understand the details of a specific file format. Of course there are Hex Editors and Unix tools to do so but sometimes a simple Python script can be nice due to it's OS independent nature.

For our script, we will output not only the hex data but also the memory address and printable ascii characters as is the standard method. That is, memory offset on the left, hex data in the middle and printable ascii characters on the right. A non-printable byte will be replaced by a dot symbol in the ascii output.

Download our script from here.

The script is also given below.


#!/usr/bin/python
#
# HEX DUMP PYTHON SCRIPT - RUBY DEVICES 2017
#
# Usage:
# hexdump.py <file_to_dump>
#
import sys
import os.path

def check_file_provided():
  # This method ensures a valid file was provided to the invoked script ##
  if (len(sys.argv) < 2):
    print ""
    print "Error - No file was provided"
    print ""
    print "Correct Usage:"
    print "python hexdump.py <file_to_dump>"
    print ""
    sys.exit(0)
  if not os.path.isfile(sys.argv[1]):
    print ""
    print "Error - The file provided does not exist"
    print ""
    sys.exit(0)
  
def read_bytes(filename, chunksize=8192):
  # This method returns the bytes of a provided file ##
  try:
    with open(filename, "rb") as f:
      while True:
        chunk = f.read(chunksize)
        if chunk:
          for b in chunk:
            yield b
        else:
          break
  except IOError:
    print ""
    print "Error - The file provided does not exist"
    print ""
    sys.exit(0)
        
def is_character_printable(s):
  ## This method returns true if a byte is a printable ascii character ##
  return all((ord(c) < 127) and (ord(c) >= 32) for c in s)
  
def print_headers():
  ## This method prints the headers at the top of our hex dump ##
  print ""
  print "#### HEX DUMP PYTHON SCRIPT - RUBY DEVICES 2017 ####"
  print ""
  print "Offset 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F"
  print ""
  
def validate_byte_as_printable(byte):
  ## Check if byte is a printable ascii character. If not replace with a '.' character ##
  if is_character_printable(byte):
    return byte
  else:
    return '.'
  
## main ##
check_file_provided()
memory_address = 0
ascii_string = ""
print_headers()

## Loop through the given file while printing the address, hex and ascii output ##
for byte in read_bytes(sys.argv[1]):
  ascii_string = ascii_string + validate_byte_as_printable(byte)
  if memory_address%16 == 0:
    print(format(memory_address, '06X')),
    print(byte.encode('hex')),
  elif memory_address%16 == 15:
    print(byte.encode('hex')),
    print ascii_string
    ascii_string = ""
  else:
    print(byte.encode('hex')),
  memory_address = memory_address + 1
    

An example of invoking the script is given below. We run the dump on a Microsoft LNK file. It's a good idea to pipe the output into "more" when running the script for a much more readable experience.


[user]~$ python hexdump.py GoogleChrome.lnk | more

#### HEX DUMP PYTHON SCRIPT - RUBY DEVICES 2017 ####

Offset 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

000000 4c 00 00 00 01 14 02 00 00 00 00 00 c0 00 00 00 L...............
000010 00 00 00 46 df 40 00 00 20 00 00 00 20 55 e3 8e ...F.@.. ... U..
000020 f6 61 d1 01 28 2a d8 b3 a4 17 d3 01 78 e9 a2 77 .a..(*......x..w
000030 6a 12 d3 01 58 45 11 00 00 00 00 00 01 00 00 00 j...XE..........
000040 00 00 00 00 00 00 00 00 00 00 00 00 0f 02 14 00 ................
000050 1f 50 e0 4f d0 20 ea 3a 69 10 a2 d8 08 00 2b 30 .P.O. .:i.....+0
000060 30 9d 19 00 2f 43 3a 5c 00 00 00 00 00 00 00 00 0.../C:\........
000070 00 00 00 00 00 00 00 00 00 00 00 88 00 31 00 00 .............1..
000080 00 00 00 18 4b 57 5c 11 00 50 52 4f 47 52 41 7e ....KW\..PROGRA~
000090 31 00 00 70 00 08 00 04 00 ef be ee 3a a3 14 18 1..p........:...
0000A0 4b 57 5c 2a 00 00 00 3c 00 00 00 00 00 01 00 00 KW\*...<........
0000B0 00 00 00 00 00 00 00 46 00 00 00 00 00 50 00 72 .......F.....P.r
0000C0 00 6f 00 67 00 72 00 61 00 6d 00 20 00 46 00 69 .o.g.r.a.m. .F.i
0000D0 00 6c 00 65 00 73 00 00 00 40 00 73 00 68 00 65 .l.e.s...@.s.h.e
-- More --

That's all for today! Have fun with it :)

Always,

Ruby Devices




Share This Post

Share on Facebook Share on Twitter Share on Google Plus Share on Linked In Share on Pinterest

Sign Up Below for Notifications on new Blog Posts


More from the Blog:


The Worlds Smartest Calculator

Chat to Phones

Ruby Devices do not in any way condone the practice of illegal activities in relation to hacking. All teachings with regards to malware and other exploits are discussed for educational purposes only and are not written with the intention of malicious application.