keywords from apache logs

I wrote a quick 1 liner last week to get the Google keywords/phrases people were searching to get to my site from the apache logs. The script parses the apache logs and searches for users coming from Google. It then parses each line and outputs the urldecoded version with the number of hits for each keyword/phrase.

cat access_log access_ssl_log | grep --line-buffered 'www.google.com' | egrep --line-buffered -o 'q=.*\&' | tr -d 'q=' | cut -d\& -f1 | egrep --line-buffered '[a-zA-Z0-9]{1}\r' | php -R 'echo urldecode($argn) . "\n";' | sed -e 's/^[ \t]*//g' | sort | uniq -c | sort -n

–Sample Output–

2 java program to count freuency of letters
2 letter freuency in java
2 myslimport csv
2 php bubble sort
2 rand with vectors c++
2 which logical unit of the computer retains information
2 zend_form_element select
4 google weather api
4 php developer resume
4 selection sort example
5 six logical units of a computer

PHP gzip test

Today I wrote a quick script to download a web page with both a compressed(gzip) version and a non-compressed version. I wanted something quick that I could run from the command line. The PHP gzip script returns the size of the page in both gzipped and non-gzipped versions. It also calculates the time it took to download each version in seconds. One last feature of the script is that it downloads the page 10 times in both versions, and displays the average of both gzipped and non-gzipped compressed versions.

How to run the script:

tully@hydralisk:/tmp$ php download.php http://example.com

No-Compression:    85047 Time: 0.00 seconds

With-Compression:  11178 Time: 1.00 seconds

Downloading compressed and non-compressed versions 10 times each and then calculating average…

Non-Compressed version: 0.5

Compressed version: 0.35

function download($site, $gzip=0)
{
// Headers
$headers = array('Accept-Encoding: compress, gzip');
$ch = curl_init($site);
if ($gzip==1)
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$content = curl_exec($ch);
return strlen($content);
}

if ($argc < 2) {
echo "Usage: php $argv[0]\n";
} else {
$start = time();
echo "No-Compression:    ".download($argv[1]);
printf("\tTime: %2.2f seconds \n", number_format(((time() - $start))));
$start = time();
echo "With-Compression:  ".download($argv[1], 1);
printf("\tTime: %2.2f seconds \n", number_format(((time() - $start))));
}

echo "\nDownloading compressed and non-compressed versions\n10 times each and then calculating average... \n";
$times = array();
for($i=0;$i<20;$i++) {
$start = time();
download($argv[1]);
array_push($times, (time() - $start));
}
echo "Non-Compressed version: ".average($times) . "\n";

$times = array();
for($i=0;$i<20;$i++) {
$start = time();
download($argv[1], 1);
array_push($times, (time() - $start));
}
echo "Compressed version: ".average($times) . "\n";

// Helper to get average
function average(Array $a)
{
return array_sum($a) / count($a);
}

Apache Includes Parser

Wrote this script to read and parse all included files in httpd.conf or any file specified when calling the PHP script. Will output content from httpd.conf(or any file specified) and all included files found and output to stdout. I used this to pipe the output to grep when searching for various Apache directives. Useful to find directives that are in a included file that are overwriting your directives set in httpd.conf.


<?php
error_reporting
(0);
if (
$argc != 2) {
echo <<<FILE
#################################################

Reads all ”Included files” in specified file and outputs to stdout.

Usage: php $argv[0] file
Example: php $argv
[0] httpd.conf
#####################################

FILE;
}
else
{
$file = file($argv[1]);
$content = “”;
try {
foreach (
$file as $line) {
$line = str_replace(“\n”,“”,$line);
if (
stristr($line,‘Include’) && !stristr($line,‘#’)) {
if (
preg_match(‘/^.*\*\.conf$/’,$line)) {
$str = str_replace(“Include ”,“”,$line);
$str = str_replace(“*.conf”,“”,$str);
$dirFiles = @scandir($str);
foreach (
$dirFiles as $dirFile) {
if (
stristr($dirFile,“.conf”)) {
$content .= file_get_contents($str.$dirFile);
}
}
}
elseif (
preg_match(‘/^.*[a-zA-z0-9]\.conf$/’,$line)) {
$content .= file_get_contents(substr($line, 8));
}
else
{
$str = str_replace(“Include ”,“”,$line);
$dirFiles = @scandir($str);
foreach (
$dirFiles as $dirFile) {
if (
stristr($dirFile,“.conf”)) {
$content .= file_get_contents($str.$dirFile);
}
}
}
}
print_r($content);
}
} catch (
Exception $e) {
echo
$e->getMessage();
}
}

Parse Apache Config for DocumentRoot in C

So I am now on my second week of learning the C language. I have been writing a lot of small programs just to get the hang of the code. I have written simple calculators, guess my number games, and simple quizzes. Tonight I was working on code that could parse a apache config and only display the DocumentRoot. The code I wrote to do this is shown below.


#include <stdlib.h>
#include <string.h>
#define MAXLINE 128

FILE *file;

int main(int argc, char *argv[])
{

    int i,x;
    char line[MAXLINE];
    char * temp;

    if (argc == 1)
    {
        puts("You must enter a file/files to search.");
        exit(1);
    }
    for (i=1; i<argc;i++)
    {
        file = fopen(argv[i], "r");
        if (file == NULL)
        {
            puts("Could'nt open file");
            exit(1);
        }
        while ( (fgets(line, MAXLINE, file)) != NULL)
        {
            temp = malloc(sizeof (char) * 128);
            for (i=0; i<MAXLINE; i++)
            {
                do 
                {
                    temp[i] = line[i];
                    i++;
                } while (line[i] != '\n');
            temp[i] = '\0';
            break;
            }
            if (strstr(temp, "DocumentRoot") != NULL) {
                printf("%s\n", temp);
            }
            free(temp);
        }
    }
}