Splunk App for McAfee Web Gateway

Version: 4.0.11

Date: 20 October 2022

About
Where to install this App
Quick Start
Get Data In
Overview of Sourcetypes and Log Formats
Configure a custom log format (mcafee:webgateway:custom) on MWG
Upgrade from 3.07
Configuration examples

Local file monitor
Local UDP/TCP input
Syslog UDP/TCP
Syslog TCP+TLS
Syslog to multiple destinations
Configure Universal Forwarder (UF) to run directly on MWG and send logs to indexer
Log pushing from MWG to a log server
Log pulling from MWG
Log pulling from WGCS
Disable rsyslog/journald rate-limiting
Syslog-NG configuration
Host extraction

Onboarding checklist
Detailed description of the mcafee:webgateway:custom Log Format
Other logs

Audit log

Next Steps / Action Plan
FAQ
Dashboard Views

Summary
Log-search
Search
URL Filter
Traffic
Mediatypes
Malware
Protocols
Connections
Applications
User-Agents
Performance
Network
Authentication
Uploads
Risk
DNS
Rules
HTTP
Headers
SSL
Security_posture
Certificates
Anomalies
Errors
Audit
Help

Troubleshooting
Summary of changes
Contributors, Attributions
Copyright
Disclamer
Contact, Support and Feedback
Additional information

mwg_xml2txt script

About

This Splunk App for McAfee Web Gateway allows rapid insights and operational visibility into McAfee Web Gateway (MWG) and McAfee Web Gateway Cloud Service (WGCS) deployments. It provides field extraction and CIM field mapping using all available types of access logs (default and custom McAfee Web Gateway log, McAfee Web Gateway Cloud Service), facilitates fast incident response and troubleshooting.

In 2022 McAfee Web Gateway (MWG) was renamed to SkyHigh Secure Web Gateway (SWG).

List of abbreviations used in this document:

Abbreviation	Meaning
MWG	McAfee Web Gateway
WGCS	McAfee Web Gateway Cloud Service
UF	Splunk Universal Forwarder

Product Compatibility:

Product	Version(s)
Splunk	6.6+, 7.x, 8.x, 9.x
MWG	7.6+, 8.x, 9.x, 10.x, 11.x
WGCS	API v5

Currently there are 85 different charts and tables grouped in 22 views

  Applications
      Applications by Hits
      Applications by Volume
      Top Blocked Applications by Hits
      Top Applications by Volume
      Top Applications by Hits
      Top Application Statistics
  Audit
      Failed Logins
      Activity by Action
      Activity by Source_Type
      Activity by User
      User Activity by Appliance
  Authentication
      Top IP by Failed Auth
      Top User-Agents by Failed Auth
      Top Destination Hosts by Failed Auth
      Top User-Agents + IPs by Failed Auth
      Top User-Agents + DestHost by Failed Auth
      Top IPs + DestHost by Failed Auth
      Top IPs + User-Agent + DestHost by Failed Auth
      Multiple Logins from diff IPs
      Multiple Usernames coming from a single IP
      Authentication Method Statistics
  Connections
      Long running transactions
  DNS
      Timechart DNS resolution time
      Timechart DNS resolution time distribution (including Cached)
      Timechart DNS resolution time distribution (excluding Cached)
      DNS distribution (1ms - 200ms)
      DNS distribution (all)
  Errors
      Error Analysis
  HTTP
      Timechart HTTP Method
      HTTP Method Statistics
      HTTP Request Headers Statistics
      HTTP Response Headers Statistics
  Easy Search
      Status Code Overview
      Web Usage by URL Category
      Web Usage by URL Category Area Graph
      Top User-Agents
      Users + IPs
      IP Addresses by Hits Graph
      Top Hosts by Hits
      Top Blocked Domains by Hits
      Top Rules by Hits
      Events
  Malware
      Malware
      Top Users by blocked Malware
  Media Types
      Media Types
      Top Media Types by Volume
      Top Media Types by Hits
      EXE Uploads/Downloads
      Macro Uploads/Downloads
      EXE and Macro Uploads/Downloads with Magic Bytes Mismatch
      Encrypted Files
  Network
      Top unreachable Servers
  Performance
      Connect to Server Latency
      Total Transaction Duration distribution
      Client-Side Latency
      DNS resolution Latency distribution
      Time in Externals Distribution
  Protocols
      Protocols by Hits
      Protocols by Hits (Percent)
      Protocols by Volume
      Protocols by Volume (Percent)
  Potential Risks
      Top SRC with high Ratio of High Risk Requests
      Unusual Ports
      Requests to IP Addresses
      CONNECT Requests to IP Addresses
      Very long URLs
      Very large request and response Headers
      Non-resolvable Domains, potential DGA (Domain Generation Algorithm)
  Rules
      Top Rules
      Block Rules Overview
      Top Block Rules
      Rule Complexity/Performance
      Slowest Rule Execution
      Time in Rule Engine Distribution
      Time in Rule Engine over Time
  Security Posture
      Content Scan is possible Ratio
  SSL
      SSL Versions by Hits (Server)
      SSL Versions by Hits (Client)
      SSL Ciphers by Hits (Server)
      SSL Ciphers by Hits (Client)
      SSL KeyExchangeBits by Hits (Server)
      SSL KeyExchangeBits by Hits (Client)
      SSL Ciphers (Server)
      SSL Versions (Server)
      Client Certificate Requested
      SSL-related blocks
      Expired Certificate
      Certificate Issuers
  Summary
      Requests / Block Ratio
      Traffic Overview
  Traffic
      Top Inbound Traffic by Source
      Top Inbound Traffic by Destination
      Top Outbound Traffic by Source
      Top Outbound Traffic by Destination
  Uploads
      Uploads
  URL Filter
      URL Categories
      Blocked by URL Filter or by Web Reputation
      Top URL Categories by Volume
      Top URL Categories by Hits
      Geolocation Stats
      High Risk Destinations
      Not categorized Domains - Chart
      Top not categorized Domains - Table
  User-Agents
      User-Agent Statistics

Where to install this App

Instance	App for McAfee Web Gateway	Add-on for McAfee Web Gateway
Standalone (all-in-one) Splunk	+	-
Search Head	+	-
Indexer	-	+
Syslog/Log Server with Universal Forwarder	-	+

Quick Start

Install Splunk directly on MWG and configure it to monitor local log folder:

Configure a custom log format (mcafee:webgateway:custom) on MWG
Install Splunk on the same MWG
Install Splunk App for McAfee Web Gateway on Splunk
CLI: Allow Splunk to read splunk.log: setfacl -m u:splunk:rx /opt/mwg/log/user-defined-logs
Configure a local file monitor

Step-by-step walkthrough: https://youtu.be/96oRco3MTu0

Configure MWG to send logs via TCP to Splunk

Configure a custom log format (mcafee:webgateway:custom) on MWG
Configure MWG to send events via UDP/TCP
Install Splunk App for McAfee Web Gateway on Splunk
Configure Splunk network input to accept logs from MWG

Step-by-step walkthrough: https://youtu.be/vYy6ddpGkNw

Get Data In

MWG can write logs to hard disk or/and send them via Syslog. Splunk can read log files locally, get them via network input (Syslog or raw UDP/TCP steam) or get them from a UF that is installed on a log server or on MWG itself. All these methods combined produce many possible ways to get MWG logs into Splunk:

Method / Link to configuration example	Description	Real time
Local file monitor	Splunk is installed directly on MWG and monitors the log file folder	Yes, up to 30 sec delay
Local UDP/TCP input	Splunk is installed directly on the MWG and gets log files sent using Syslog	yes
Syslog UDP/TCP	MWG sends logs via UDP/TCP to syslog collector or directly to Splunk	yes
Syslog TCP+TLS	MWG sends logs via TCP, encrypted with TLS, to syslog collector or directly to Splunk	yes
UF	Install UF on MWG to monitor log file folder	yes, up to 30 sec delay
Log pushing from MWG to a log server	Use pushing (FTP/FTPS/SCP/SFTP/HTTP/HTTPS) from MWG to a log server	no
Log pulling from MWG	Pulling logs from MWG via API, scp or rsync	no
Log pulling from WGCS	Pulling logs via WGCS API	no

Installing UF directly on MWG and configure UF to forward events to Splunk indexer is a recommended and most reliable method!

Further consideration:

Local input (monitor local log files) is a simplest method used for testing or in small environments.
Syslog UDP is usually not recommended because of the potential packet loss.
If possible, install the syslog collector/server in the same VLAN/Network as MWG. Avoid unreliable links (WiFi/WAN), Firewall (especially with DPI/IDS) between MWG and the syslog collector.
For large environments Splunk doesn't recommend sending syslog directly to Splunk indexers, and suggests using an intermediate syslog server instead.
McAfee doesn't support installing UF directly on MWG, but it can be a better option in some situations.
For large environment use one of Splunk's validated architecture designs.
For more details like syslog collector location, UDP vs TCP etc. read https://splunk.github.io/splunk-connect-for-syslog/main/architecture/
HOWTO: Configure a McAfee Web Gateway (MWG) syslog to send TLS-secured data to Splunk https://youtu.be/-nSkYdDQA00
HOWTO: Splunk App for McAfee Web Gateway (MWG) - send logs to Splunk - step by step configuration: https://youtu.be/vYy6ddpGkNw

Overview of Sourcetypes and Log Formats

There are several possible log formats that can be used. Compare your logs with example below to find out the current format.

On-premise Web Gateway

Log Format	Sourcetype	# of MWG fields	# of CIM fields	Average log line length (HTTPS Scanner enabled)	Comment/Example
Default Access Log	mcafee:webgateway:default	14	17	~700 Bytes	Default log format with a fixed structure, provides only minimal subset of fields. Use it only if no MWG modification is possible. [26/Feb/2021:14:40:23 +0100] "" 192.168.2.n 200 "GET https://example.com/test&adk=1473563476 HTTP/2.0" "Web Ads" "Minimal Risk" "image/gif" 286 538 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:86.0) Gecko/20100101 Firefox/86.0" "" "0" "Google"
Legacy Log for the Splunk App v.3.0.7	MWGaccess3	26	27	~650 Bytes	Customized log format with a fixed structure, provides more fields than the default log, including some timings and transferred bytes. Wasteful information like User-Agent string is shortened. Consider it obsolete. [26/Feb/2021:14:40:23 +0100]status="200/0" srcip="192.168.2.n" user="" profile="-" dstip="-" dhost="example.com" urlp="443" proto="HTTPS/https" mtd="GET" urlc="Web Ads" rep="0" mt="image/gif" mlwr="-" app="Google" bytes="538/539/289/286" ua="FF86.0-10.0" lat="0/0/59/434" rule="Last Rule" url="https://example.com/test&adk=1473563476"
Custom Log (recommended)	mcafee:webgateway:custom	50-100	50-100	~600-1800 Bytes	New custom modular log format (described in details below), logs fields can be added/removed as needed, provides full CIM coverage and deep insights for analytics and rapid troubleshooting. Despite the significantly larger amount of provided information, the size of the log has changed insignificantly. This new format provides up to 3x higher information density than the default log format. 2021-02-26 14:40:23 +0100 204 allowed 192.168.2.n https GET example.com 443 775/58 88/1 up="/test" ua="FF86-10.0" a="Google" c="wa" dip=142.250.185.nn kex=112/112 cntx sccc=1302/1302 sslp=1.3/1.3 sslicn="GTS CA 1O1,GlobalSign" sslcn="example.com" crtdays=-66 ctmt0 rul="L" rn=13/44 srcp=63298 conrt=0 b=744/239 psrcip=192.168.2.n psrcp=20010 piv=2.0/2.0 r=0 t=0/0/86/87/56/56/3/4/28

Web Gateway Cloud Service (WGCS)

WGCS log format provides a subset of required fields, there are several API versions:

Log Format	Sourcetype	# of MWG fields	# of CIM fields	Average log line length (HTTPS Scanner enabled)	Comment/Example
WGCS API version 5	mcafee:webgateway:wgcs_v5	28	28	~300-400 Bytes	"user_id","username","source_ip","http_action","server_to_client_bytes","client_to_server_bytes","requested_host","requested_path","result","virus","request_timestamp_epoch","request_timestamp","uri_scheme","category","media_type","application_type","reputation","last_rule","http_status_code","client_ip","location","block_reason","user_agent_product","user_agent_version","user_agent_comment","process_name","destination_ip","destination_port" "-1","142.250.185.nn","142.250.185.nn","GET","206","1040","example.com","/test","OBSERVED","","1626329868","2021-07-15 06:17:48","https","Business, Software/Hardware","application/x-empty","","Minimal Risk","Internal Request handled","200","8.65.16.n","","","Other","","","","78.47.250.n","443"
WGCS API version 6	mcafee:webgateway:wgcs_v6 (not supported yet)	28	28	~300-400 Bytes	No new fields are introduced. All fields from versions 1 – 5 are downloaded. Starting with API version 6, an error message is sent with the response to a download request that has timed out.
WGCS API version 7	mcafee:webgateway:wgcs_v7 (not supported yet)	28	28	~300-450 Bytes	All fields from versions 1 – 6 are downloaded, plus these fields: pop_country_code referer ssl_scanned av_scanned_up av_scanned_down rbi
WGCS API version 8	mcafee:webgateway:wgcs_v8 (not supported yet)	28	28	~300-500 Bytes	All fields from versions 1 – 7 are downloaded, plus these fields: dlp client_system_name filename pop_egress_ip pop_ingress_ip proxy_port

Configure a custom log format (mcafee:webgateway:custom) on MWG

Extract the file Splunk_Log_XXXXXX.xml (where XXXXXX is the version) from the MWG folder of the application package.
Import Splunk_Log_XXXXXX.xml file in MWG into the Default Log Handler: Policies > Rule Sets > Log Handler, right click on "Default" and select Add > Rule Set from Library

In the new window that appears, click on the "Import from file" button, then choose the xml file and click OK.

click "Auto-Solve Conflicts..." > select "Solve by referring to existing objects" and click OK to import the RuleSet.

If some of the imported RuleSets/Rules marked red - that means some properties like Header.Request.GetAll (available on MWG 10.x+) are not available in the current MWG version. Just delete such rules or upgrade MWG to the latest 10.x+ version. If a TLS RuleSet shown red, it needs to be modified as described below in the Troubleshooting section.

The Log configuration has a modular structure, you can choose to send just a preconfigured minimal set of fields or select any subset from available fields. The log ruleset contains several parts (see numbering on the next screenshot):

Required rulesets for CIM conformed logging.
Web Data Model ruleset where a log line from the previously prepared fields are built.
Additional rulesets where other fields are added as needed.
DEBUG ruleset helps to verify that the log lines built correctly.
Write Splunk.log - final log line modifications, performance monitoring of the Splunk ruleset itself and writing the Splunk log to the hard disk.
Send via Syslog.
RuleSet Library - optional templates that can to be copied into appropriate Policy Rule Sets (Opener, Media Type Filter etc.) to be able to get information that is usually not available in the logging cycle.

Here are most important modifications that you can do in additional Rulesets (block of RuleSets #3 on the previous screenshot).

Ruleset	Possible modifications
Splunk	Domains not to log - some domains can be excluded from logging completely.
Set Timestamp	choose the right timestamp. The ISO format with a time zone is selected by default. Other options are ToGMT, ISO8601, unix epoch and ToWebReporter formats. If you change the timestamp format on MWG then you have to adjust the TIME_FORMAT setting in local/props.conf on Splunk Indexer.
Client IP	Connection.IP property is used by default. Deselect it and select Client.IP if you have downstream proxies or loadbalancer between the client and MWG.
URL Categories	add internal domains to "internal Domains" list to avoid them to being shown as "uncategorized"
Headers	on MWG older than version 10.x some rules will be marked in red if they are not compatible - delete them or upgrade MWG to the newest 10.x version or later.
TLS	disable this ruleset if HTTPS Scanner is not enabled
-	To get the correct Rule statistics you must create one last ruleset with a rule named "Last Rule" which is applied to all cycles (Request, Response, Embedded).
RuleSet Library	Opener, Hashes/Body, Malware, Media Type, Uploads - to get some of the required information, additional rules need to be placed in the corresponding Policy Rule Sets. If you skip this step, some tables and graphs will be empty. Watch a YouTube video on the Splunkbase for step by step instructions.

Create a "Last Rule Set" with an empty "Last Rule" as a most bottom rule in the Rule Sets Tree:

Copy Rules to Certification Verification Rule Set to be able to log information about certification parameters:

Upgrade from 3.07

Create a backup of MWG config, export MWG Log Rules, backup your current app.
Check if there are any custom changes in the old MWG Log Rules or in the app.
Check which sourcetype is currently used - MWGaccess3 or "default". MWGaccess3 works with new version without any changes, the "default" is named "mcafee:webgateway:default".
Upgrade an App via GUI or CLI.
Follow the installation instructions for version 4.x.x.
Modify "index_and_sourcetype" macro to include an index and a sourcetype (i.e. 'index=proxy AND sourcetype="mcafee:webgateway:custom"')
It is recommended to switch from default or MWGaccess3 to new mcafee:webgateway:custom log format.

Configuration examples

Local file monitor

MWG UI: Configure a custom log format (mcafee:webgateway:custom)
CLI: Allow Splunk to read splunk.log: setfacl -m u:splunk:rx /opt/mwg/log/user-defined-logs
Splunk UI: Settings > Data inputs > Files & directories > New local File & Directory
Browser to or type in: /opt/mwg/log/user-defined-logs/splunk.log/splunk.log
press Next
Sourcetype: Select > mcafee:webgateway:custom
App Context: McAfee Web Gateway
Host: Constant value
Index: leave on "Default" or create a new index, for example "proxy"
Press Preview and review options
Press Submit

Local UDP/TCP input

Instead of letting Splunk read local splunk.log, events can be sent to a local Splunk instance via a local network interface or even loopback interface, without writing events to the hard disk (i.e. "Write Splunk Log" Rule Set can be disabled).

MWG UI:

Configure a custom log format (mcafee:webgateway:custom) on MWG, enable "Send via Syslog" RuleSet, optionally disable "Write Splunk Log" RuleSet.
Modify rsyslog.conf: Configuration > File Editor > [Hostname] > rsyslog.conf

Find the following line *.info;mail.none;authpriv.none;cron.none /var/log/messages and modify it to *.info;daemon.!=info;mail.none;authpriv.none;cron.none -/var/log/messages.
If receiving syslog or indexer "knows" how to extract sending host field then the syslog header can be removed. Add $template msg_only,"%msg%\n". On old MWG versions following line could be required instead: $template msg_only,"%msg:2:$%\n"
Add if $programname == 'mwg' and $syslogfacility-text == 'daemon' and $syslogseverity-text == 'info' then @@hostname-or-IP-address-of-the-local-MWG:6514;msg_only. Modify the port as needed. Alternatively use if $programname == 'mwg' and $syslogfacility-text == 'daemon' and $syslogseverity-text == 'info' then @127.0.0.1:6514;msg_only
Add $MaxMessageSize 100k Important: In order for this directive to work correctly, it must be placed right at the top of rsyslog.conf (before any input is defined).
Add $SystemLogRateLimitInterval 0
Add $SystemLogRateLimitBurst 0
Add $imjournalRatelimitInterval 0
Add $imjournalRatelimitBurst 0
More information: https://kcm.trellix.com/corporate/index?page=content&id=KB77988

Verify that the syslog prefix is "mwg" unter Configuration > Appliances > Syslog > Log Prefix

$ModLoad imuxsock # provides support for local system logging (e.g. via logger command)
$ModLoad imklog # reads kernel messages (the same are read from journald)
$WorkDirectory /var/lib/rsyslog
$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
$IncludeConfig /etc/rsyslog.d/*.conf
$ActionName messages
*.info;daemon.!=info;mail.none;authpriv.none;cron.none -/var/log/messages
authpriv.*                                              /var/log/secure
mail.*                                                  -/var/log/maillog
cron.*                                                  /var/log/cron
*.emerg                                                 :omusrmsg:*
uucp,news.crit                                          /var/log/spooler
local7.*                                                /var/log/boot.log
$template msg_only,"%msg%\n"
if $programname == 'mwg' and $syslogfacility-text == 'daemon' and $syslogseverity-text == 'info' then  @127.0.0.1:6514;msg_only

Splunk UI:

Splunk UI: Settings > Data inputs > UDP > New Local UDP
Port: a port from a previous step, for example 6514
Press Next
Sourcetype: Select > mcafee:webgateway:custom
App Context: McAfee Web Gateway
Host Method: IP or DNS
Index: leave on "Default" or create a new index, for example "proxy"
Press Preview and review options
Press Submit

Syslog UDP/TCP

MWG UI:

Configure a custom log format (mcafee:webgateway:custom) on MWG, enable "Send via Syslog" RuleSet, optionally disable "Write Splunk Log" RuleSet.
Modify rsyslog.conf: Configuration > File Editor > [Hostname] > rsyslog.conf

Find the following line *.info;mail.none;authpriv.none;cron.none /var/log/messages and modify it to *.info;daemon.!=info;mail.none;authpriv.none;cron.none -/var/log/messages.
If receiving syslog or indexer "knows" how to extract sending host field then the syslog header can be removed. Add $template msg_only,"%msg%\n". On old MWG versions following line could be required instead: $template MYFORMAT,"%msg:2:$%\n"
Add if $programname == 'mwg' and $syslogfacility-text == 'daemon' and $syslogseverity-text == 'info' then @@hostname-or-IP-address-of-the-remote-splunk:6514;msg_only. Modify the port as needed.
Add $MaxMessageSize 100k Important: In order for this directive to work correctly, it must be placed right at the top of rsyslog.conf (before any input is defined).
Add $SystemLogRateLimitInterval 0
Add $SystemLogRateLimitBurst 0
Add $imjournalRatelimitInterval 0
Add $imjournalRatelimitBurst 0
More information: https://kcm.trellix.com/corporate/index?page=content&id=KB77988
Additionally, it is recommended by rsyslog author himself to switch from slow imjournal to imuxsock/imklog as described in https://kcm.trellix.com/corporate/index?page=content&id=KB92256

Verify that the syslog prefix is "mwg" unter Configuration > Appliances > Syslog > Log Prefix

$ModLoad imuxsock # provides support for local system logging (e.g. via logger command)
$ModLoad imklog # reads kernel messages (the same are read from journald)
$WorkDirectory /var/lib/rsyslog
$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
$IncludeConfig /etc/rsyslog.d/*.conf
$ActionName messages
*.info;daemon.!=info;mail.none;authpriv.none;cron.none -/var/log/messages
authpriv.*                                              /var/log/secure
mail.*                                                  -/var/log/maillog
cron.*                                                  /var/log/cron
*.emerg                                                 :omusrmsg:*
uucp,news.crit                                          /var/log/spooler
local7.*                                                /var/log/boot.log
$template msg_only,"%msg%\n"
if $programname == 'mwg' and $syslogfacility-text == 'daemon' and $syslogseverity-text == 'info' then  @@server:6514;msg_only

Splunk UI:

Settings > Data inputs > TCP > New Local TCP
Port: a port from a previous step, for example 6514
Press Next
Sourcetype: Select > mcafee:webgateway:custom
App Context: McAfee Web Gateway
Host Method: IP or DNS (if Splunk can resolve IP address of MWG)
Index: create a new index, for example "proxy"
Press Preview and review options
Press Submit

Syslog TCP+TLS

$DefaultNetstreamDriver gtls
$DefaultNetstreamDriverCAFile /etc/rsyslog.d/certs/example.com.ca.pem
$DefaultNetstreamDriverCertFile /etc/rsyslog.d/certs/mwg.example.com.pem
$DefaultNetstreamDriverKeyFile /etc/rsyslog.d/certs/mwg.example.com.key

#$ActionSendStreamDriverAuthMode x509/name
$ActionSendStreamDriverAuthMode anon
#$ActionSendStreamDriverPermittedPeer splunk.example.com
$ActionSendStreamDriverMode 1

Configure a McAfee Web Gateway (MWG) syslog to send TLS-secured data to Splunk

Syslog to multiple destinations

Syslog (6, User-Defined.logLine)

Syslog (5, User-Defined.logLine)

# exclude both daemon.notice and daemon.info:
*.info;mail.none;daemon.!=info;daemon.!=notice;authpriv.none;cron.none -/var/log/messages

$ActionQueueFileName fwdRule1
$ActionQueueMaxDiskSpace 1g
$ActionQueueSaveOnShutdown on
$ActionQueueType LinkedList
$ActionResumeRetryCount -1
# use the new expression format instead of "traditional" severity and facility based selectors, because an expression like daemon.info match all messages of specified priority and HIGHER that can leads to duplicated events
if $programname == 'mwg' and $syslogfacility-text == 'daemon' and $syslogseverity-text == 'info' then @@syslog1

$ActionQueueFileName fwdRule2
$ActionQueueMaxDiskSpace 1g
$ActionQueueSaveOnShutdown on
$ActionQueueType LinkedList 
$ActionResumeRetryCount -1
if $programname == 'mwg' and $syslogfacility-text == 'daemon' and $syslogseverity-text == 'notice' then @@syslog2

Configure Universal Forwarder (UF) to run directly on MWG and send logs to indexer

MWG UI: Configure a custom log format (mcafee:webgateway:custom) on MWG, let "Send via Syslog" RuleSet disabled
CLI: Allow Splunk to read splunk.log: setfacl -m u:splunk:rx /opt/mwg/log/user-defined-logs
Install UF on MWG
Install Add-on for McAfee Web Gateway (https://splunkbase.splunk.com/app/5452/)

Create a file /opt/splunkforwarder/etc/apps/TA_McAfee_Web_Gateway/local/inputs.conf with following content (modify as needed):

  [monitor:///opt/mwg/log/user-defined-logs/splunk.log/splunk.log]
  sourcetype = mcafee:webgateway:custom
  # index = proxy

Create an outputs.conf, limits.conf, etc. configuration as needed.

Log pushing from MWG to a log server

MWG UI: Configure a custom log format (mcafee:webgateway:custom) on MWG, let "Send via Syslog" RuleSet disabled
MWG UI: Policy > Settings > Engines > File System Logging > Splunk Log > Settings for Rotation, Pushing and Deletion > Enable specific settings for user defined log:
- Configure Auto Rotation as needed
- Configure Auto Deletion as needed
- Configure Auto Pushing: enable auto pushing, set destination server, enable pushing log files directly after rotation
On a receiving log server: use Splunk file monitor or UF

Log pulling from MWG

MWG UI: Configure a custom log format (mcafee:webgateway:custom) on MWG, let "Send via Syslog" RuleSet disabled
Using a script, API or other method pull logs from MWG to Splunk

Log pulling from WGCS

McAfee Web Gateway Cloud Service (WGCS) provides the log with a reduced set of fields, therefore only a subset of views will work properly.

There are several ways to pull WGCS logs:

Use Logging Client: https://success.myshn.net/MVISION_Cloud_for_Unified_Cloud_Edge/Unified_Cloud_Edge_Logging_Client/Download_and_install_the_Logging_Client
Configure log pulling using a shell or PowerShell script https://communitym.trellix.com/t5/Web-Gateway-Cloud-Service/Sending-WGCS-logs-to-on-premise-Splunk/td-p/622784
Use McAfee Content Security Reporter (CSR) to download WGCS logs, configure post-processing to move processed log files to some directory and use Universal Forwarder to monitor this directory.

When reading WGCS logs, use [monitor:// and not [batch://, because batch seems to delete logs too early. Use a separate Scheduled Task (schtasks /create /tn "Delete old WGCS" /tr "delete_old_logs.bat" /sc HOURLY ) to delete old logs, for example ForFiles /p D:\WGCS_Logs /d -1 /c "cmd /c del /q @file"

An example of inputs.conf:

[monitor://D:\WGCS_Logs]
sourcetype = mcafee:webgateway:wgcs_v5
index = proxy
crcSalt = <SOURCE>

Disable rsyslog/journald rate-limiting

McAfee Web Gateway based on RedHat/CentOS 7 and inherits some settings that rate-limit syslog. Read https://www.ibm.com/support/pages/how-disable-rsyslog-rate-limiting and https://access.redhat.com/solutions/1417483 to modify or disable rate-limiting in /etc/rsyslog.conf (using MWG UI) and /etc/systemd/journal.conf .

$SystemLogRateLimitInterval 0
$SystemLogRateLimitBurst 0
$imjournalRatelimitInterval 0
$imjournalRatelimitBurst 0

RateLimitInterval=0
RateLimitBurst=0

Syslog-NG configuration

Use following configuration for syslog-ng (on receiving side):

network
flags(no-parse)

Host extraction

Correct extraction of host field is very important. Unfortunately default methods of host extraction have some downsides:

If not explicitly set, a "$decideOnStartup" from inputs.conf or system hostname is used to set 'host' value
If not set explicitly, you can face situations when host value for the same machine will be set to IP, a short hostname or even to FQDN
The host name can appear in UPPER or low case
If Splunk or syslog server configured to get the host name of the sending host via reverse DNS resolution of the IP address and a DNS server isn't available, it will fall back to IP address
If there is a load balancer inbetween, the host field will contain wrong value
A syslog header can contain for example mwg (short name), MWG (short name upper case), IP address or even mwg.example.com (FQDN) depending on configuration

To summarize it all: it is better to set a host value explicit and not rely on "heuristic" that can lead to several host values for the same machine. With UF set a host in inputs.conf. With Syslog either use host_segement/host_regex on a syslog receiver or send a host name of MWG directly with an event and extract it during the ingestion. The second method allows to disable syslog header directly on MWG by defining the "msg_only" rsyslog template as described in Syslog UDP/TCP section:

[mcafee:webgateway:custom]
TRANSFORMS-extract_host_from_event = extract_host_from_event

[extract_host_from_event]
REGEX = \shost=(\S+)
FORMAT = host::$1
DEST_KEY = MetaData:Host

The host field must be placed before long fields like url path or url query, because they can "push" the host field outside of the first 4096 bytes/characters limit defined by LOOKAHEAD property defined in transforms.conf, that specifies how far Splunk looks in the event for index-time fields.

Disabling syslog header has several benefits:

usually syslog header contains wrong timestamp that can differ from MWG timestamp
all information in syslog header except "host" is usually useless
no need to add useless syslog header on MWG and then remove it again on Splunk
a short log line without the syslog header is easier to read
you can save between 5 and 15% of the license cost

Onboarding checklist

Check	Expected Result	Conditions/Causes	Comment
Timestamp and Timezone	Timestamp and timezone are correct, there are not "future" events		\| eval diff=_indextime - _time
Index	Index is correct		Use a separate index for proxy events
Sourcetype	sourcetype is correct
Host extraction	Host extraction is correct	Syslog	Don't rely on rDNS, it decrease performance and can fail. Hosts server1, SERVER1, server1.example.com, 10.20.30.40 can be one host, but are various hosts from Splunk point of view.
Integrity	All events reach Splunk, no events are lost	Syslog, high log rate	useACK, rsyslog: disk queue
Truncation	Long log lines aren't truncated	rsyslog: MaxMessageSize, syslog-ng: log_msg_size, syslog via UDP, Splunk: TRUNCATE	test-link
Logging delay	Low logging delay		\| eval diff=_indextime - _time
Log integrity in case of network interruption	Short network interruptions shouldn't lead to loss of events		useACK, rsyslog: disk queue
Secure transfer	Log transferred via TLS, Certificate validation, mTLS
Multiline	There are no mulitline proxy events
Duplicates	There are no duplicate events
Parsing	All events parsed correctly, action/src/dest fields are always present
Settings location	All settings are placed inside of MWG App or TA	Settings can be placed in a wrong app if GUI is used	Use btool to verify.

Detailed description of the mcafee:webgateway:custom Log Format

Why a new log format? Neither the default nor the previously used MWGaccess3 log formats provide enough information for SIEM to be useful. For example these formats provide very limited information about download/upload risky files. Many SIEM correlation rules will not work properly if a transferred file was embedded as a part of a composite object (zip, iso, docx, etc.) or has different/faked media-type header or extension.

The new log format provides following use cases among many others:

Even if a transaction was allowed, detect all potentially dangerous objects and log their true media-type, hash and size.
Even if a transaction was white-listed and not checked for the Web-Reputation and URL-Categorization - this checks are performed in the Log Cycle after the transaction was completed and the log event will contain them.
DNS lookup of dest_host if there is more than one IP, reverse DNS lookup of URL.Destination.IP allow to detect fast-flux C&C Servers.

The new custom log format (mcafee:webgateway:custom) consists of several parts:

Timestamp
Fixed set of fields: status, action, client_ip, url_protocol, http_method, dest, dest_port, bytes_out/bytes_in, duration/response_time. These fields have no field prefix - Splunk extract them based on the log structure.
Variable set of fields: they are included in log only if they are enabled AND exist. For example, a URL path will not be included for this URL: https://www.example.com/. These fields have either a short field prefix (for example up=) or consist of a single string (i.e. "tunnel") and can exist in any part of the log line, their order is not important. Any of the variable fields can be enabled and disabled on the MWG at any time, without need to modify anything on the Splunk side. You can enable conditional logging for these fields, for example a query string can be logged only for some subset of categories, certificate information (Issuer, Common Name, Subject Alternative Names etc.) - only for suspicious transactions etc.

2021-02-26 14:36:46.449 -0600 200 allowed 192.168.2.n https GET safebrowsing.googleapis.com 443 563/4156 38/17 up="/v4/threatListUpdates" ua="FF86-10.0" c="it" dip=142.250.185.n kex=112/112 cntx sccc=1302/1302 sslp=1.3/1.3 sslicn="GTS CA 1O1,GlobalSign" sslcn="upload.video.google.com" crtdays=-52 mbmismatch ctmt0 rul="L" rn=41/104 srcp=62407 conrt=0 b=524/4418 tunnel psrcip=192.168.2.nn psrcp=42550 piv=2.0/2.0 r=0 t=0/0/34/34/18/18/22/11/11

Instead of logging a URL as-is, MWG splits the URL into usable parts which will be put together on Splunk's end.

By default, the query string is not logged. You can enable it in the Web Data Model ruleset if needed.

An excerpt of the 100 most useful fields is provided below. MWG has about 900 properties that can be used for logging.

Description of logged fields

MWG field

CIM field

Comment

Timestamp

Property	Example	TIME_FORMAT / Comment
DateTime.ToISOString	2010-03-22 11:45:12)	%Y-%m-%d %H:%M:%S
DateTime.ToISOString with Milliseconds	2010-03-22 11:45:12.123	%Y-%m-%d %H:%M:%S.%3N
DateTime.ToISOString with Milliseconds and timezone	2010-03-22 11:45:12.123 -0600	%Y-%m-%d %H:%M:%S.%3N %z
DateTime.ToISOString and timezone	2010-03-22 11:45:12 -0600	%Y-%m-%d %H:%M:%S %z
DateTime.ToGMTString	Mon, 22 March 2010 11:45:36 GMT	%a, %d %B %Y %H:%M:%S %Z
DateTime.ToISO8601String	2016-01-26T11:45:36.695Z	this time format can produce unexpected output, don't use it
DateTime.ToNumber	Unix epoch time - 1512915182	%s
DateTime.ToWebReporterString	[29/Oct/2010:14:28:15 +0000]	\[%d/%b/%Y:%H:%M:%S %z\]

Connection.IP / Client.IP

src

Client.IP takes the value of X-Forwarded-For header

Authentication.UserName

user

Message.TemplateName, Block.ID,
Response.StatusCode, Protocol.FailureDescription,
BytesFromServer, Command.Name,
Action.Names

action

The action taken by the proxy: allowed, blocked, error or auth. Various MWG properties are used to calculate correct action field.

URL

url

Don't enable it, Splunk build URL based on uri components

URL.Categories

Other logs

Audit log

Audit logs (/opt/mwg/log/audit/audit.log) contains all changes and activity made by administator(s) using UI or REST interface. Audit log can be sent using an UF or custom syslog configuration. Almost 70 actions are mapped to Authentication and Change CIM Data Models:

Action	action	change_type	object_category
ACTIVATE_LICENSE_FILE	modified		license
ADDED_ADMINROLE	added	AAA	role
ADDED_APPLIANCE	added		appliance
ADDED_CONTENT	added	filesystem	config
ADDED_GROUP_ROLE_MAPPING	added	AAA	role
ADDED_RULES	added		config
ADDED_SYSTEM_FILES	added	filesystem	file
ADDED_TEMPLATE_DIRECTORIES	added	filesystem	directory
AUTHENTICATE_WITH_EXTERNAL_SERVER	success
BACKUP_TRIGGERED	created		backup
CREATED_NEW_LIST	added		config
CREATED_NEW_RULE	added		config
CREATED_NEW_RULEGROUP	added		config
CREATED_NEW_SETTINGS	added		config
CREATED_NEW_USER	added	AAA	user
CREATED_NEW_USER_DEFINED_PROPERTY	added		config
DASHBOARD_DATA_RESET	deleted
DATE_CHANGED	modified		config
DELETED_ADMINROLE	deleted	AAA	role
DELETED_APPLIANCE	deleted		appliance
DELETED_CONTENT	deleted		config
DELETED_LIST	deleted		config
DELETED_LOG_HANDLER	deleted		config
DELETED_RULE	deleted		config
DELETED_RULE_GROUP	deleted		config
DELETED_RULES	deleted		config
DELETED_SETTINGS	deleted		config
DELETED_TEMPLATE_DIRECTORIES	deleted		directory
DELETED_TEMPLATE_FILES	deleted		file
DELETED_USER	deleted	AAA	user
DELETED_USER_DEFINED_PROPERTY	deleted		config
EXPORT_PRIVATE_KEY	read		config
FILE_DOWNLOAD	read		file
FILE_UPLOAD	added	filesystem	file
FILES_DELETE	deleted	filesystem	file
FORCED_USER_LOGOUT	logout
JOINED_NTLM	modified		config
LEFT_NTLM	modified		config
MODIFIED_ADMINROLE	modified		role
MODIFIED_APPLIANCE_SETTINGS	modified		config
MODIFIED_CLUSTER_CONFIGURATION	modified		config
MODIFIED_CONTENT	modified		config
MODIFIED_CATALOG	modified		config
MODIFIED_GROUP_ROLE_MAPPING	modified		role
MODIFIED_LIST	modified		config
MODIFIED_NTLM	modified		config
MODIFIED_RULE	modified		config
MODIFIED_RULE_GROUP	modified		config
MODIFIED_SETTINGS	modified		config
MODIFIED_SYSTEM_FILES	modified	filesystem	file
MODIFIED_TEMPLATE_FILES	modified	filesystem	file
MODIFIED_USER	modified	AAA	user
MODIFIED_USER_DEFINED_PROPERTY	modified		config
MOVED_RULE_GROUPS	modified		config
MOVED_RULES	modified		config
REORDERED_CONTENT	modified		config
RESTORE_FAILED	modified		config
RESTORE_STARTED	pending		config
RESTORE_SUCCEDED	modified		config
SAVING_FAILED	read		config
SYSTEM_LIST_UPDATE	modified		config
TRIGGER_ACTION	pending		config
USER_LOGIN	success
USER_LOGIN_FAILED	failure
USER_LOGOUT	logout
USER_TIMED_OUT	timeout

Next steps / Action Plan

You want to:	Action
complete setup	Double-check the Onboarding checklist If the timestamp format was modified, adjust validation regex in the Splunk RuleSet > DEBUG > Verify Log Structure and report if it is not correct Search for logging errors: TERM(LOGERR1) OR TERM(LOGERR2) OR TERM(LOGERR3)
use non-default index	Modify "index_and_sourcetype" macro to include an index (i.e. 'index=proxy AND sourcetype="mcafee:webgateway:custom"')
implement Common Information Model (CIM)	Install Splunk Common Information Model (CIM) App
import new version of the Splunk Logging Ruleset but keep all modifications	Use a mwg_xml2txt script to see differences between versions.
build accelerated DM	Don't put a high variable strings like uri_path, uri_query, url in accelerated DM until you really need it
improve proxy performance, find causes of high latency	Check errors, web cache (should be disabled!), timers (esp. DNS)
configure data retention	Configure frozenTimePeriodInSecs TBD
implement some GDRP requirements	Check if personally indentifiable information (PII) should be removed, encrypted, obfuscated or masked. TBD
investigate a breach/incident	Create a copy of all relevant events (also from other sources) to avoid aging it out. TBD
implement a 4-eyes principle	It can be implemented either on the proxy side or using splunk. TBD
send events to other destination besides splunk	Modify rsyslog.conf or use "Route and filter data". TBD
customize or create own views and reports	TBD
add new fields	TBD
exclude some events from search	Create a macro to exclude some sources, destinations or user-agents and add it to a query
exclude some events from logging	On MWG: Modify existing list "Domains not to log" or create own excluding rules
improve search performance	rewrite queries to use base search (be aware of base search restrictions) rewrite queries to use accelerated data model use TERM/PREFIX for filtering
correctly log FTP/FTPoverHTTP connections	Due to the nature of FTP requests, the MWG events don't correctly reflect connection type. This requires more work, both on MWG and on Splunk side. TBD
work with IPv6 addresses	TBD

FAQ

Q: The default dashboard filter (user, src, destination and user-agent) is not enough, how to add other filter conditions? A: You can use any of default input fields to add own SPL, for example in a "User" input field enter * block_id=80 to show all matching events with block_id equals 80.
Q: MWG or SWG? A: MWG was also known as Webwasher (-2004), Cyber Guard Webwasher (2004-2006), Secure Computing Webwasher (2006-2008), McAfee Web Gateway (2008-2015?), Intel Security Web Gateway (SWG) (~2012/2015-2017?), then again as McAfee Web Gateway (2017?-2022), now SkyHigh Security Web Gateway (SWG) (2022+).

Dashboard Views

Summary

Log-search

Search

URL Filter

Traffic

Mediatypes

Malware

Protocols

Connections

Applications

User-agents

Performance

Network

Authentication

Uploads

Risk

DNS

Rules

HTTP

Headers

HTTP protocol can be used by malware for communication with C&C, blending in a normal web traffic generated by benign applications like browsers. However, most enterprise security solutions don’t analyze all parts of the HTTP protocol and even if they do, only partial information can be logged: either a small subset of headers (like User-Agent, X-Forwarded-For, Referer, etc.) or header names must be configured explicitly. Neither of these methods allows to log all or unknown headers.

Fortunately, the recent MWG/SWG versions close this security gap by allowing to log all HTTP headers. The rule based policy logic make possible to apply such deep logging on suspicious transactions only, significantly reducing log volume.

Enabling the collection of header information on the MWG side: TBD

Using a conditional criterias to apply header collection to suspicious transactions only: TBD

Log example with request headers information: TBD

Working with Header View: TBD

Next steps: configure response headers collection

SSL

Security posture

Certificates

Anomalies

Errors

Audit

Help

Troubleshooting

Has the corresponding MWG Logging RuleSet been imported?
Are some charts and tables empty? - Check that the required fields and values are collected by the Splunk Rule Set in the Logging Cycle, activate them as needed.
Does a "Last Rule" exist on the MWG?
Were supplement rules copied in the Policy Rule Sets?
Does Splunk get any input?
Does a search for index=* (sourcetype=mcafee:* OR sourcetype=MWGaccess3) output raw events?
Does Splunk recognize the timestamp correctly?
If sent via Syslog - was the Syslog header part correctly removed?
Are there any errors in $SPLUNK_HOME/var/log/splunk/splunkd.log?
Problem: Events are not parsed correctly because there is a space before the timestamp. Solution: modify the log template on MWG to $template msg_only,"%msg:2:$%\n"
Problem: Imported Splunk RuleSet has some RuleSets marked red - some properties like Header.Request.GetAll are available only on new MWG versions (10+) and rules containing such "unknown" properties will be marked red if imported on older MWG versions. Just delete such rules or upgrade MWG to the newest 10+ version.

If a TLS Ruleset shown red, modify it as following (delete a second condition "SSL.Server.Certificate.SignatureMethod is not in list null" and replace it with "SSL.Server.Certificate.SignatureMethod is not in list Safe Signature Algorithms". Safe Signature Algorithms is a McAfee supplied list that should be already present in recent MWG versions:

If the list "Safe Signature Algorithms" is not present, create it as following:
Problem: Rule value is empty therefore Rule Statistics doesn't work on MWG 11.0-11.0.2. Answer: this is a bug in Map.GetStringValue function that was fixed in MWG 11.1, please update your MWG or temporarely disable Log Handler > Splunk > Rules > "Rules.CurrentRule.Name (short if exists in Rule Map)" rule.

Summary of changes

4.0.11 - added a lookup of excutables that can be used for download and exfiltration (https://lolbas-project.github.io/). Fixed a TIME_PREFIX for wgcs_v5
4.0.10 - fixed extraction of authentication_method, authentication_realm, auth_failure_message and auth_failure_id fields (Thank you ML!)
4.0.9 - improved WGCS regexes, now URL, rule name and User-Agent fields that contains quote character(s) are parsed correctly. Improved a TIME_PREFIX to fix parsing errors. New CIM fields added. Added distsearch.conf to enable replication of macros.
4.0.8 - added sc_admin role to default.meta
4.0.7 - support for MWG audit log, a feedback form, a new auth method statistics view
4.0.6 - better README with more examples, global export in default.meta, MWG Log has autorotation/autodeletion enabled in case it was not enabled globally
4.0.5 - added parsing of McAfee Web Gateway Cloud Service (WGCS) Logs
4.0.4 - applied required changes to keep compatibility with Splunk Cloud (use jquery 3.5), improved documentation, minor fixes
4.0.3 - added Security Posture view, minor fixes
4.0.2 - improved Error Analysis view, minor fixes
4.0.1 - new major release, new log format, better documentation, new views: SSL, Errors, Uploads
3.0.7 - commit changes in props.conf and transform.conf by Myron Davis, add contributors section in README, clarifications for installation process in README
3.0.6 - enabled Splunk CIM (Common Information Model) version 4, by Myron Davis, compatibility with Splunk App for Enterprise Security, by Myron Davis, renamed App folder from AppForMcAfeeWebGateway to McAfeeWebGateway to match it with the app ID
3.0.5 - The App package now includes a step-by-step installation instruction with screenshots, the log structure was reordered to avoid overwriting of parameters
3.0.4 - new short log format, many redundant fields removed, cleanup, faster search, some panels were merged. This major version isn't compatible with the version 2.xx

Contributors/Attributions

Thanks to Myron Davis for a lot of suggestions, enabling CIM, compatibility for Enterprise Security App
Thanks to Simon B.
Thanks to the McAfee/SkyHigh Community Forum

Copyright

This App, documentation and MWG logging ruleset are licensed under Creative Commons BY-ND 3.0

Disclamer

Test anything before using in production.
All you do with this App is on your own responsibility.

Contact, Support and Feedback

E-Mail: splunk@compek.net
Splunk Answers

Additional information

mwg_xml2txt script

The MWG Splunk Logging RuleSet is quite complex. Most customers modify it to accommodate to own needs. Use this script to see all modifications when importing new version of the RuleSet.

Usage:
Step 1: convert XML to TXT and compare them
perl mwg_xml2txt.pl old_ruleset.xml > old_ruleset.txt
perl mwg_xml2txt.pl new_ruleset.xml > new_ruleset.txt
vimdiff old_ruleset.txt new_ruleset.txt

VIMDIFF will compare TXT files and highlight differences in lists and rules using color output. It can be a simple enabled vs disabled, but can be also a more complex modification - in this case use a Step 2 to do a direct XML comparison.

Step 2: identify differences and optionally extract corresponding XML section for comparison
export a single rule from xml ruleset (replace RuleName with an actual Rule Name that you want to extract)
perl -0777 -e '$a=<>; ($rule)=$a=~m/(\QRuleName\E.*?<\/rule>)/ms; print "$rule"' ruleset_old.xml > rule_old.txt
perl -0777 -e '$a=<>; ($rule)=$a=~m/(\QRuleName\E.*?<\/rule>)/ms; print "$rule"' ruleset_new.xml > rule_new.txt
vimdiff rule_old.xml rule_new.xml

After the Step 1 you'll see similar output (see below). The [true] or [false] indicate if the rule is enabled or disabled. The short 6-char string after each line is first 6 chars of the md5 from the whole rule block, so even a small modification will be highlighted.

Rules.CurrentRule.Name (short if exists in Rule Map) [true] 250e76
Rules.CurrentRule.Name (if doesnt exist in Rule Map) [true] a1b8cf
------------------------------------------------------------------
Rules.CurrentRule.Name (Last Rule) [false] 5a7005
Rules.CurrentRule.Name [false] a568cc
Number of FiredRules / EvaluatedRules (based on Last Rule presence) [false] 7b4f37
Number of FiredRules / EvaluatedRules (based on loghandler position) [true] ebfaf3

Rules.CurrentRule.Name (short if exists in Rule Map) [true] 250e76
Rules.CurrentRule.Name (short if exists in Rule Map) [false] f3a8a4
Rules.CurrentRule.Name (if doesnt exist in Rule Map) [false] 181bd9
Rules.CurrentRule.Name (Last Rule) [false] 5a7005
Rules.CurrentRule.Name [false] a568cc
Number of FiredRules / EvaluatedRules (based on Last Rule presence) [false] 7b4f37
Number of FiredRules / EvaluatedRules (based on loghandler position) [true] ebfaf3



#!/usr/bin/perl
use strict;
use warnings;
my $version = "0.3 17.Oct.2022 by PP";
use Digest::MD5 qw(md5_hex);
# <list version="1.0.3.46" mwg-version="11.2.4-42436" name="Authentication UserGroups to log" id="com.scur.type.string.483"
#    <listEntry>
#       <entry>application/vnd.ms-excel.addin.macroEnabled.12</entry>
#       <description>MS Office 2007 Excel addin (macro-enabled)</description>
#    </listEntry>
#
#<list version="1.0" mwg-version="11.1.4-40769" name="Map" id="com.scur.type.complex.maptype.321" typeId="com.scur.type.complex.maptype" classifier="Other" systemList="false" structuralList="false" defaultRights="2">
#        <description></description>
#        <content>
#          <listEntry>
#            <complexEntry defaultRights="2">
#              <configurationProperties>
#                <configurationProperty key="key" type="com.scur.type.string" encrypted="false" value="test"/>
#                <configurationProperty key="value" type="com.scur.type.string" encrypted="false" value="OK"/>
#              </configurationProperties>
#
#   <ruleGroup id="4122" defaultRights="2" name="Splunk" enabled="true" cycleRequest="true" cycleResponse="true" cycleEmbeddedObject="true" cloudSynced="false">
#      <rule id="5820" enabled="true" name="Domains not to log">
# usage: 
# Step 1: convert XML to TXT and compare them
# perl mwg_xml2txt.pl old_ruleset.xml > old_ruleset.txt
# perl mwg_xml2txt.pl new_ruleset.xml > new_ruleset.txt
# vimdiff old_ruleset.txt new_ruleset.txt
# 
# VIMDIFF will compare TXT files and highlight differences in lists and rules using color output. It can be a simple enabled vs disabled, 
# but can be also a more complex modification - in this case use a Step 2 to do a direct XML comparison.
#
# Step 2: identify differences and optionally extract corresponding XML section for comparison
# export a single rule from xml ruleset:
# perl -0777 -e '$a=<>; ($rule)=$a=~m/(\QRuleName\E.*?<\/rule>)/ms; print "$rule"' ruleset_old.xml > rule_old.txt
# perl -0777 -e '$a=<>; ($rule)=$a=~m/(\QRuleName\E.*?<\/rule>)/ms; print "$rule"' ruleset_new.xml > rule_new.txt
# vimdiff rule_old.xml rule_new.xml

my $line=1;
my $xml = undef;

open (my$fh, '<', $ARGV[0]) or die "cannot open file: $!";
{ 
  local $/=undef;
  $xml = <$fh>;
}
close $fh;

my @lists=$xml=~m/<list [^<]+ name="([^"]+)"/g;
foreach my $list_name (sort @lists){
  my $list = undef;
  if($xml =~/(<list [^\n]+ name="$list_name" [^\n]+com\.scur\.type\.complex\.maptype.+?<\/list>)/ms){ # map has other structure
    $list = $1;
    #print "$list_name\n$list\n\n"; 
    my @entries = $list =~ m/ key="key"[^\n]+value="([^\n]+\n[^\n]+value="[^"]+)"/msg;
    s/([^"]+)".*\n.*"([^"]*)/$1 - $2/msg for @entries; # remove anything except key-value
    print "$list_name\n  ".(join "\n  ",sort @entries)."\n\n";
  }elsif($xml =~/(<list [^\n]+ name="$list_name".+?<\/list>)/ms){ 
    $list = $1;
    #print "$list_name\n$list\n\n"; 
    my @entries = $list =~ m/<entry>([^<]+)<\/entry>/msg;
    print "$list_name\n  ".(join "\n  ",sort @entries)."\n\n";
  }else{ 
    die "cannot find list" 
  };
}

while(<>){
  #print "$line: $_";
  $line++;
  next if /<ruleGroups\/?>/;
  my($ruleid,$string,$offset,$name,$enabled,$rule_block)=(undef,undef,undef,undef,undef,undef);
  if(/^(\s*)<ruleGroup/){
    $offset=$1;
    if(/ name="([^"]+)"/){$name=$1};
    if(/ rule="([^"]+)"/){$ruleid=$1};
    if(/ enabled="([^"]+)"/){$enabled=$1};
    if(/^(.*)$/){$string=$1};
    #print "$offset $name [$enabled]\n"
    ($rule_block) = $xml =~ /(\Q$string\E.*?<rule )/ms;
    $rule_block =~ s/(id=")\d+"/$1XXX"/msg;
    $rule_block =~ s/(propertyId=")\d+"/$1XXX"/msg;
    $rule_block =~ s/(id="com\.scur\.type\.\w+\.)\d+"/$1XXX"/msg;
    $rule_block =~ s/(id="com\.scur\.type\.complex\.\w+\.)\d+"/$1XXX"/msg;
	$rule_block =~ s/(com\.scur\.engine\.\w+\.)\d+/$1XXX/msg;
    if(not defined $rule_block){die "Rule block not defined for $string"};
    print "$offset $name [$enabled] ".substr((md5_hex($rule_block)),0,6)."\n"

  }elsif(/^(\s*)<rule id=/){
    $offset=$1;
    if(/ name="([^"]+)"/){$name=$1};
    if(/ rule="([^"]+)"/){$ruleid=$1};
    if(/ enabled="([^"]+)"/){$enabled=$1};
    if(/^(.*)$/){$string=$1};
    ($rule_block) = $xml =~ /(\Q$string\E.*?<\/rule>)/ms;
    $rule_block =~ s/(id=")\d+"/$1XXX"/msg;
    $rule_block =~ s/(propertyId=")\d+"/$1XXX"/msg;
    $rule_block =~ s/(id="com\.scur\.type\.\w+\.)\d+"/$1XXX"/msg;
    $rule_block =~ s/(id="com\.scur\.type\.complex\.\w+\.)\d+"/$1XXX"/msg;
	$rule_block =~ s/(com\.scur\.engine\.\w+\.)\d+/$1XXX/msg;
    if(not defined $rule_block){die "Rule block not defined for $string"};
    print "$offset $name [$enabled] ".substr((md5_hex($rule_block)),0,6)."\n"
  }    
}