Introduction to Log Analysis with grep, awk and sed

toolboxI know there are a lot of cool tools in everyone’s arsenal these days. However I wanted to call attention to some command line tools that are worth mentioning. Many seasoned engineers have come across most of these, but there may be some who have not yet realized the power of chaining these tools together. In this article, we take a look at a few ways several common command line tools can be used to provide flexibility when parsing log files. Specifically, we will take a quick look at how grep, awk, and sed can be used together to provide some powerful solutions. We will also look at how the output can be fed through sort and uniq to provide additional ways to view the data.

The following command line executables are found in the Mac as well as most Linux Distributions.

  • cat–prints the content of a file in the terminal window
  • grep–searches and filters based on patterns
  • awk–can sort each row into fields and display only what is needed
  • sed–performs find and replace functions
  • sort–arranges output in an order
  • uniq–compares adjacent lines and can report, filter or provide a count of duplicates

In this article we will take several lines of a log,  summarize the information, then drill down into the interesting areas. The output below is in a file called cisco.log. This is basically output that was buffered on the router I use at home. When looking at output that has been sent to a syslog server, the field positions will be different. However, the same concepts apply.

Pauls-MacBook-Pro:downloads pestewart$ cat cisco.log 
*Sep  4 10:07:40.135: EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface IPv6-mpls, changed state to up
*Sep  4 10:07:41.135: EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/0, changed state to up
*Sep  4 10:07:41.135: EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/1, changed state to up
*Sep  4 05:07:42.519 EST: %SSH-5-DISABLED: SSH 1.99 has been disabled
*Sep  4 05:07:43.575 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface Loopback1, changed state to up
*Sep  4 05:07:43.607 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface Loopback10, changed state to up
*Sep  4 05:07:43.651 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface Tunnel0, changed state to down
*Sep  4 05:07:43.807 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface NVI0, changed state to up
*Sep  4 05:07:44.679 EST: %LINK-5-CHANGED: Interface Tunnel0, changed state to administratively down
*Sep  4 05:07:56.627 EST: %SYS-5-CONFIG_I: Configured from memory by console
*Sep  4 05:07:57.203 EST: %SYS-5-RESTART: System restarted --
Cisco IOS Software, 1841 Software (C1841-ADVENTERPRISEK9-M), Version 12.4(15)T12, RELEASE SOFTWARE (fc3)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2010 by Cisco Systems, Inc.
Compiled Fri 22-Jan-10 00:35 by prod_rel_team
*Sep  4 05:07:57.203 EST: %SNMP-5-COLDSTART: SNMP agent on host R1 is undergoing a cold start
*Sep  4 05:07:57.259 EST: %SSH-5-ENABLED: SSH 1.99 has been enabled
*Sep  4 05:07:57.535 EST: %LINK-3-UPDOWN: Interface Virtual-Access1, changed state to up
*Sep  4 05:07:58.903 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface Virtual-Access1, changed state to up
Sep  6 11:57:00.021 EST: %DHCPD-4-PING_CONFLICT: DHCP address conflict:  server pinged 192.168.1.42.
Sep 14 11:44:51.710 EST: %DHCPD-4-PING_CONFLICT: DHCP address conflict:  server pinged 192.168.1.249.
Sep 20 18:06:27.586 EST: %SYS-5-CONFIG_I: Configured from console by admin on vty0 (192.168.1.50)
Sep 20 18:07:37.735 EST: %SYS-5-CONFIG_I: Configured from console by admin on vty0 (192.168.1.50)Pauls-MacBook-Pro:downloads pestewart$

To quickly demonstrate the print feature in awk, we can instruct it to show only the field that contains “facility-level-mnemonic”. In my file this is the 5th word of each line. Therefore, I will print $5. Only the last 4 lines are being shown for brevity.

Pauls-MacBook-Pro:downloads pestewart$ cat cisco.log | awk '{print $5}' | tail -n 4
%DHCPD-4-PING_CONFLICT:
%DHCPD-4-PING_CONFLICT:
%SYS-5-CONFIG_I:
%SYS-5-CONFIG_I:

Looking at a large file would still produce a large amount of output. A more useful thing to do might be to output every entry found in “$5”, group them together, count them, then sort them from the greatest to least number of occurrences. This can be done by piping the output through “sort“, using “uniq -c” to count the like entries, then using “sort -rn” to sort it in reverse order.

Pauls-MacBook-Pro:downloads pestewart$ cat cisco.log | awk '{print $5}'| sort | uniq -c | sort -rn
   8 %LINEPROTO-5-UPDOWN:
   3 %SYS-5-CONFIG_I:
   2 %DHCPD-4-PING_CONFLICT:
   1 by
   1 Software
   1 Cisco
   1 %SYS-5-RESTART:
   1 %SSH-5-ENABLED:
   1 %SSH-5-DISABLED:
   1 %SNMP-5-COLDSTART:
   1 %LINK-5-CHANGED:
   1 %LINK-3-UPDOWN:
   1

While that’s sort of cool, it is obvious that we have some garbage in our output. Evidently we have a few lines that aren’t conforming to the output we expect to see in $5. We can insert grep to filter the file prior to feeding it to awk. This insures that we are at least looking at lines of text that contain “facility-level-mnemonic”.

Pauls-MacBook-Pro:downloads pestewart$ cat cisco.log | grep %[a-zA-Z]*-[0-9]-[a-zA-Z]* | awk '{print $5}' | sort | uniq -c | sort -rn
   8 %LINEPROTO-5-UPDOWN:
   3 %SYS-5-CONFIG_I:
   2 %DHCPD-4-PING_CONFLICT:
   1 %SYS-5-RESTART:
   1 %SSH-5-ENABLED:
   1 %SSH-5-DISABLED:
   1 %SNMP-5-COLDSTART:
   1 %LINK-5-CHANGED:
   1 %LINK-3-UPDOWN:
Pauls-MacBook-Pro:downloads pestewart$

Now that the output is cleaned up a bit, it is a good time to investigate some of the entries that appear most often. One way to see all occurrences is to use grep.

Pauls-MacBook-Pro:downloads pestewart$ cat cisco.log | grep %LINEPROTO-5-UPDOWN:
*Sep  4 10:07:40.135: EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface IPv6-mpls, changed state to up
*Sep  4 10:07:41.135: EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/0, changed state to up
*Sep  4 10:07:41.135: EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/1, changed state to up
*Sep  4 05:07:43.575 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface Loopback1, changed state to up
*Sep  4 05:07:43.607 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface Loopback10, changed state to up
*Sep  4 05:07:43.651 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface Tunnel0, changed state to down
*Sep  4 05:07:43.807 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface NVI0, changed state to up
*Sep  4 05:07:58.903 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface Virtual-Access1, changed state to up

That works well when there is only a few entries. However in a production environment, there could be thousands of lines. It might make sense to re-tool what we have learned about awk and uniq to get some summaries based on what interfaces are going up and down most often.

Pauls-MacBook-Pro:downloads pestewart$ cat cisco.log | grep %LINEPROTO-5-UPDOWN:| awk '{print $10}' | sort | uniq -c | sort -rn
   1 Virtual-Access1,
   1 Tunnel0,
   1 NVI0,
   1 Loopback10,
   1 Loopback1,
   1 IPv6-mpls,
   1 FastEthernet0/1,
   1 FastEthernet0/0,

As shown above, each of these interfaces are showing up once. One pet peeve I have with these entries is that they have a comma at the end. This can be solved by running the output through sed before creating the summary.

Pauls-MacBook-Pro:downloads pestewart$ cat cisco.log | grep %LINEPROTO-5-UPDOWN:| sed 's/,//g' | awk '{print $10}' | sort | uniq -c | sort -rn
   1 Virtual-Access1
   1 Tunnel0
   1 NVI0
   1 Loopback10
   1 Loopback1
   1 IPv6-mpls
   1 FastEthernet0/1
   1 FastEthernet0/0

Another thing that could be done is including additional fields from the log. For example, printing ‘$10 ” changed to ” $14’, shows how many times each interface went up and down.

Pauls-MacBook-Pro:downloads pestewart$ cat cisco.log | grep %LINEPROTO-5-UPDOWN:| sed 's/,//g' | awk '{print $10 " changed to " $14}' | sort | uniq -c | sort -rn
   1 Virtual-Access1 changed to up
   1 Tunnel0 changed to down
   1 NVI0 changed to up
   1 Loopback10 changed to up
   1 Loopback1 changed to up
   1 IPv6-mpls changed to up
   1 FastEthernet0/1 changed to up
   1 FastEthernet0/0 changed to up

These are just a few very simple examples that demonstrate the power of chaining these command line tools together. Becoming familiar with these commands and their functions will open up new opportunities to quickly locate critical data in large log files.

No related content found.

About Paul Stewart, CCIE 26009 (Security)

Paul is a Network and Security Engineer, Trainer and Blogger who enjoys understanding how things really work. With over 15 years of experience in the technology industry, Paul has helped many organizations build, maintain and secure their networks and systems.
This entry was posted in How-To. Bookmark the permalink.