keywords from apache logs

Posted by Tully on Fri 04 March 2011

I wrote a quick 1 liner last week to parse the Apache logs and return the keywords / phrases people were using to get to my site.

The code parses each line and outputs the urldecoded version with the number of hits for each keyword / phrase.

Note: The backslashes and newlines have been added for readability.

Code

cat access_log access_ssl_log | \
    grep --line-buffered 'www.google.com' | \
    egrep --line-buffered -o 'q=.*\&' | \
    tr -d 'q=' | cut -d\& -f1 | \
    egrep --line-buffered '[a-zA-Z0-9]{1}\r' | \
    php -R 'echo urldecode($argn) . "\n";' | \
    sed -e 's/^[ \t]*//g' | \
    sort | uniq -c | sort -n

Output

2 java program to count freuency of letters  
2 letter freuency in java  
2 myslimport csv  
2 php bubble sort  
2 rand with vectors c++  
2 which logical unit of the computer retains information  
2 zend_form_element select  
4 google weather api  
4 php developer resume  
4 selection sort example  
5 six logical units of a computer