Linux for SEO: Real Commands & a Free Log Analysis Script Inside

Linux for SEO: Real Commands & a Free Log Analysis Script Inside

Full Linkedin Post Here

If SEO was easy, everyone would be analyzing access logs for fun!

-Milan Misic / Head of SEO Aphex Media


SEO isn’t just about keywords and backlinks – understanding your website’s access logs can unlock powerful insights into how search engines and users interact with your site. 🕵️♂️

If you’re running a site and want to dive into log analysis for SEO purposes, Linux commands are your best friend! Here’s a practical list of useful Linux commands that I use daily and how they can help with SEO improvements. 👇👇👇

Understanding Log Formats and Fields (just a quick one I prommise) 📄

Before I get into log file analysis for SEO, it’s crucial to understand the structure and components of access log files. These files contain a wealth of information about every request made to your website, providing valuable insights into search engine bot behavior, user interactions, and potential technical issues. Let’s break down the common log formats and the key fields you’ll encounter.

Common Log File Formats

Web servers generate logs in different formats depending on the configuration and platform used. The two most common formats are:

Common Log Format (CLF)

  • A standardized log format used by Apache and Nginx servers.
  • Example entry:
192.168.1.1 - - [10/Jan/2025:13:55:36 +0000] "GET /index.html HTTP/1.1" 200 5323

Fields:

  • IP Address (192.168.1.1)
  • Date and Time ([10/Jan/2025:13:55:36 +0000])
  • HTTP Method and Resource (“GET /index.html HTTP/1.1”)
  • Status Code (200)
  • Bytes Transferred (5323)

Combined Log Format (CLF + More Data)

  • An extended version of the common log format, adding referrer URLs and user-agent strings to help analyze traffic sources and visitor devices.
  • Example entry:
192.168.1.1 - - [10/Jan/2025:13:55:36 +0000] "GET /index.html HTTP/1.1" 200 5323 "http://example.com" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 

Additional fields:

  • Referrer (“http://example.com”)
  • User-Agent (“Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”)

👨💻Now let’s get to practical examples👨💻

1. Checking Googlebot Activity

Command:

grep 'Googlebot' access.log | less

Use: Find out how frequently Googlebot visits your site and which pages it crawls the most. 📊


2. Filtering Specific URL Paths

Command:

grep '/blog/' access.log | awk '{print $1, $4, $7}' | sort | uniq -c | sort -nr | head -10

Use: Identify your most visited blog pages and optimize content based on search patterns. 📝


3. Spotting 404 Errors (Broken Links Detection)

Command:

grep ' 404 ' access.log | awk '{print $7}' | sort | uniq -c | sort -nr | head -20

Use: Find and fix broken links that may be harming your SEO performance. 🛠️


4. Analyzing Crawl Frequency Over Time

Command:

awk '{print $4}' access.log | cut -d: -f1 | sort | uniq -c | sort -nr | head -10

Use: Identify peak crawling times to adjust your content update schedules. ⏳


5. Checking Mobile vs. Desktop Traffic

Command:

grep -E 'Mobile|Desktop' access.log | awk '{print $1, $12}' | sort | uniq -c | sort -nr

Use: Understand the traffic distribution across devices to optimize for mobile-first indexing. 📱


6. Finding Slow Loading Pages

Command:

awk '{ if ($NF > 5000) print $7, $NF }' access.log | sort -k2 -nr | head -10

Use: Identify slow-loading pages that might impact rankings and user experience. 🐢


7. Tracking Bot Traffic Volume

Command:

grep -E 'bot|crawler|spider' access.log | awk '{print $1, $12}' | sort | uniq -c | sort -nr

Use: Detect how much of your traffic is from search engine bots versus real users. 🤖


8. Detecting Unauthorized Bot Access (Scrapers & Bad Bots)

Command:

grep -v 'Googlebot\|Bingbot' access.log | grep 'bot' | sort | uniq -c | sort -nr | head -20

Use: Identify suspicious bot activity and protect your SEO efforts from scrapers. 🚫


9. Counting Hits per Page

Command:

awk '{print $7}' access.log | sort | uniq -c | sort -nr | head -20

Use: Determine the most accessed pages and align your SEO strategy accordingly. 🌟


10. Identifying Referrer Sources

Command:

awk '{print $11}' access.log | sort | uniq -c | sort -nr | head -10

Use: Discover which external sources drive the most traffic to your site. 🌍


11. Visualize Log Data for Better Insights

Command:

awk '{print $1 "," $7 "," $9}' access.log > log_report.csv

Use: Exporting log data into CSV for visualization in Excel, Google Sheets, or BI tools. 📊


12. Identifying Site Speed Issues Through Logs

Command:

awk '{ if ($NF > 5000) print $7, $NF }' access.log | sort -k2 -nr | head -10

Use: Tracking slow-loading pages and server response times as well as detecting pages that take longer than expected to load. 🚄


FREE Server Logs Analysis Bash Script

This script is very simple yet helpful tool I use to analyze website access logs with ease. It provides a menu-driven interface to perform various log analysis tasks, such as identifying the most crawled pages, detecting errors, and analyzing traffic sources. The script exports the results into easy-to-read CSV files, making it simple to track and optimize your site’s performance over time.

Bash Script: log_analysis.sh

#!/bin/bash

LOGFILE="access.log"

# Function to check if log file exists
if [[ ! -f "$LOGFILE" ]]; then
    echo "Error: Log file $LOGFILE not found!"
    exit 1
fi

echo "=============================="
echo " SEO Log Analysis Menu "
echo "=============================="
echo "1. Find the Most Crawled Pages by Googlebot"
echo "2. Detect 404 Errors and Their Frequency"
echo "3. Analyze Bot vs. Human Traffic"
echo "4. Check Server Response Time for Slow Pages"
echo "5. Track Traffic by Hour (Peak Times)"
echo "6. Count Visits per User-Agent (Identify Crawlers)"
echo "7. Extract Top Referring Websites"
echo "8. Detect Suspicious Activity (High Hit Rates from Single IPs)"
echo "9. Find Most Accessed Pages"
echo "10. Monitor Search Engine Crawl Rate Over Time"
echo "11. Identify Pages with 500 Server Errors"
echo "12. Export Logs to CSV for Further Analysis"
echo "13. Find Pages with High Bounce Rates (Single Page Requests)"
echo "14. Analyze Traffic from Specific Countries"
echo "15. Exit"
echo "=============================="
read -p "Select an option (1-15): " option

# Execute the chosen action
case $option in
    1)
        awk '{print $7}' $LOGFILE | sort | uniq -c | sort -nr | head -10 > googlebot_crawled_pages.csv
        echo "Results saved to googlebot_crawled_pages.csv"
        ;;
    2)
        grep ' 404 ' $LOGFILE | awk '{print $7}' | sort | uniq -c | sort -nr | head -20 > error_404_report.csv
        echo "Results saved to error_404_report.csv"
        ;;
    3)
        grep -E 'bot|crawler|spider' $LOGFILE | wc -l > bot_vs_human_traffic.csv
        grep -vE 'bot|crawler|spider' $LOGFILE | wc -l >> bot_vs_human_traffic.csv
        echo "Results saved to bot_vs_human_traffic.csv"
        ;;
    4)
        awk '$NF > 5000 {print $7, $NF}' $LOGFILE | sort -k2 -nr | head -10 > slow_pages.csv
        echo "Results saved to slow_pages.csv"
        ;;
    5)
        awk '{print substr($4,14,2)}' $LOGFILE | sort | uniq -c | sort -nr | head -10 > traffic_by_hour.csv
        echo "Results saved to traffic_by_hour.csv"
        ;;
    6)
        awk -F\" '{print $6}' $LOGFILE | sort | uniq -c | sort -nr | head -10 > user_agents.csv
        echo "Results saved to user_agents.csv"
        ;;
    7)
        awk '{print $11}' $LOGFILE | sort | uniq -c | sort -nr | head -10 > referrer_websites.csv
        echo "Results saved to referrer_websites.csv"
        ;;
    8)
        awk '{print $1}' $LOGFILE | sort | uniq -c | sort -nr | head -10 > suspicious_ips.csv
        echo "Results saved to suspicious_ips.csv"
        ;;
    9)
        awk '{print $7}' $LOGFILE | sort | uniq -c | sort -nr | head -10 > most_accessed_pages.csv
        echo "Results saved to most_accessed_pages.csv"
        ;;
    10)
        grep 'Googlebot' $LOGFILE | awk '{print $4}' | cut -d: -f1 | sort | uniq -c | sort -nr > crawl_rate.csv
        echo "Results saved to crawl_rate.csv"
        ;;
    11)
        grep ' 500 ' $LOGFILE | awk '{print $7}' | sort | uniq -c | sort -nr | head -10 > server_errors.csv
        echo "Results saved to server_errors.csv"
        ;;
    12)
        awk '{print $1 "," $4 "," $7 "," $9}' $LOGFILE > seo_log_report.csv
        echo "Results saved to seo_log_report.csv"
        ;;
    13)
        awk '{print $7}' $LOGFILE | sort | uniq -c | sort -nr | head -10 > bounce_rate.csv
        echo "Results saved to bounce_rate.csv"
        ;;
    14)
        grep -E 'US|UK|DE' $LOGFILE | awk '{print $1, $7}' | sort | uniq -c | sort -nr > geo_traffic.csv
        echo "Results saved to geo_traffic.csv"
        ;;
    15)
        echo "Exiting the script. Goodbye!"
        exit 0
        ;;
    *)
        echo "Invalid option. Please run the script again."
        exit 1
        ;;
esac 

How to Run the Script

  • Save the script as log_analysis.sh
  • Give execution permission:
chmod +x log_analysis.sh
  • Run the script:
./log_analysis.sh
  • Follow the on-screen menu to analyze logs and export results to .csv files.

Note: Script must be placed in same folder as access.log file and you need to run it there!


Understanding log formats and analyzing key fields allows SEO professionals to:

✅ Track search engine crawling behavior

✅ Optimize website performance

✅ Identify errors before they impact rankings

✅ Improve user experience

Why I personaly love Linux env for analyzing server logs? 🤔

✅ Optimize crawling budget ✅ Identify performance bottlenecks ✅ Improve website architecture ✅ Detect and fix critical issues✅I feel like hacker while doing it