If SEO was easy, everyone would be analyzing access logs for fun!
-Milan Misic / Head of SEO Aphex Media
SEO isn’t just about keywords and backlinks – understanding your website’s access logs can unlock powerful insights into how search engines and users interact with your site. 🕵️♂️
If you’re running a site and want to dive into log analysis for SEO purposes, Linux commands are your best friend! Here’s a practical list of useful Linux commands that I use daily and how they can help with SEO improvements. 👇👇👇
Understanding Log Formats and Fields (just a quick one I prommise) 📄
Before I get into log file analysis for SEO, it’s crucial to understand the structure and components of access log files. These files contain a wealth of information about every request made to your website, providing valuable insights into search engine bot behavior, user interactions, and potential technical issues. Let’s break down the common log formats and the key fields you’ll encounter.
Common Log File Formats
Web servers generate logs in different formats depending on the configuration and platform used. The two most common formats are:
Common Log Format (CLF)
- A standardized log format used by Apache and Nginx servers.
- Example entry:
192.168.1.1 - - [10/Jan/2025:13:55:36 +0000] "GET /index.html HTTP/1.1" 200 5323
Fields:
- IP Address (192.168.1.1)
- Date and Time ([10/Jan/2025:13:55:36 +0000])
- HTTP Method and Resource (“GET /index.html HTTP/1.1”)
- Status Code (200)
- Bytes Transferred (5323)
Combined Log Format (CLF + More Data)
- An extended version of the common log format, adding referrer URLs and user-agent strings to help analyze traffic sources and visitor devices.
- Example entry:
192.168.1.1 - - [10/Jan/2025:13:55:36 +0000] "GET /index.html HTTP/1.1" 200 5323 "http://example.com" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
Additional fields:
- Referrer (“http://example.com”)
- User-Agent (“Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”)
👨💻Now let’s get to practical examples👨💻
1. Checking Googlebot Activity
Command:
grep 'Googlebot' access.log | less
Use: Find out how frequently Googlebot visits your site and which pages it crawls the most. 📊
2. Filtering Specific URL Paths
Command:
grep '/blog/' access.log | awk '{print $1, $4, $7}' | sort | uniq -c | sort -nr | head -10
Use: Identify your most visited blog pages and optimize content based on search patterns. 📝
3. Spotting 404 Errors (Broken Links Detection)
Command:
grep ' 404 ' access.log | awk '{print $7}' | sort | uniq -c | sort -nr | head -20
Use: Find and fix broken links that may be harming your SEO performance. 🛠️
4. Analyzing Crawl Frequency Over Time
Command:
awk '{print $4}' access.log | cut -d: -f1 | sort | uniq -c | sort -nr | head -10
Use: Identify peak crawling times to adjust your content update schedules. ⏳
5. Checking Mobile vs. Desktop Traffic
Command:
grep -E 'Mobile|Desktop' access.log | awk '{print $1, $12}' | sort | uniq -c | sort -nr
Use: Understand the traffic distribution across devices to optimize for mobile-first indexing. 📱
6. Finding Slow Loading Pages
Command:
awk '{ if ($NF > 5000) print $7, $NF }' access.log | sort -k2 -nr | head -10
Use: Identify slow-loading pages that might impact rankings and user experience. 🐢
7. Tracking Bot Traffic Volume
Command:
grep -E 'bot|crawler|spider' access.log | awk '{print $1, $12}' | sort | uniq -c | sort -nr
Use: Detect how much of your traffic is from search engine bots versus real users. 🤖
8. Detecting Unauthorized Bot Access (Scrapers & Bad Bots)
Command:
grep -v 'Googlebot\|Bingbot' access.log | grep 'bot' | sort | uniq -c | sort -nr | head -20
Use: Identify suspicious bot activity and protect your SEO efforts from scrapers. 🚫
9. Counting Hits per Page
Command:
awk '{print $7}' access.log | sort | uniq -c | sort -nr | head -20
Use: Determine the most accessed pages and align your SEO strategy accordingly. 🌟
10. Identifying Referrer Sources
Command:
awk '{print $11}' access.log | sort | uniq -c | sort -nr | head -10
Use: Discover which external sources drive the most traffic to your site. 🌍
11. Visualize Log Data for Better Insights
Command:
awk '{print $1 "," $7 "," $9}' access.log > log_report.csv
Use: Exporting log data into CSV for visualization in Excel, Google Sheets, or BI tools. 📊
12. Identifying Site Speed Issues Through Logs
Command:
awk '{ if ($NF > 5000) print $7, $NF }' access.log | sort -k2 -nr | head -10
Use: Tracking slow-loading pages and server response times as well as detecting pages that take longer than expected to load. 🚄
FREE Server Logs Analysis Bash Script
This script is very simple yet helpful tool I use to analyze website access logs with ease. It provides a menu-driven interface to perform various log analysis tasks, such as identifying the most crawled pages, detecting errors, and analyzing traffic sources. The script exports the results into easy-to-read CSV files, making it simple to track and optimize your site’s performance over time.
Bash Script: log_analysis.sh
#!/bin/bash
LOGFILE="access.log"
# Function to check if log file exists
if [[ ! -f "$LOGFILE" ]]; then
echo "Error: Log file $LOGFILE not found!"
exit 1
fi
echo "=============================="
echo " SEO Log Analysis Menu "
echo "=============================="
echo "1. Find the Most Crawled Pages by Googlebot"
echo "2. Detect 404 Errors and Their Frequency"
echo "3. Analyze Bot vs. Human Traffic"
echo "4. Check Server Response Time for Slow Pages"
echo "5. Track Traffic by Hour (Peak Times)"
echo "6. Count Visits per User-Agent (Identify Crawlers)"
echo "7. Extract Top Referring Websites"
echo "8. Detect Suspicious Activity (High Hit Rates from Single IPs)"
echo "9. Find Most Accessed Pages"
echo "10. Monitor Search Engine Crawl Rate Over Time"
echo "11. Identify Pages with 500 Server Errors"
echo "12. Export Logs to CSV for Further Analysis"
echo "13. Find Pages with High Bounce Rates (Single Page Requests)"
echo "14. Analyze Traffic from Specific Countries"
echo "15. Exit"
echo "=============================="
read -p "Select an option (1-15): " option
# Execute the chosen action
case $option in
1)
awk '{print $7}' $LOGFILE | sort | uniq -c | sort -nr | head -10 > googlebot_crawled_pages.csv
echo "Results saved to googlebot_crawled_pages.csv"
;;
2)
grep ' 404 ' $LOGFILE | awk '{print $7}' | sort | uniq -c | sort -nr | head -20 > error_404_report.csv
echo "Results saved to error_404_report.csv"
;;
3)
grep -E 'bot|crawler|spider' $LOGFILE | wc -l > bot_vs_human_traffic.csv
grep -vE 'bot|crawler|spider' $LOGFILE | wc -l >> bot_vs_human_traffic.csv
echo "Results saved to bot_vs_human_traffic.csv"
;;
4)
awk '$NF > 5000 {print $7, $NF}' $LOGFILE | sort -k2 -nr | head -10 > slow_pages.csv
echo "Results saved to slow_pages.csv"
;;
5)
awk '{print substr($4,14,2)}' $LOGFILE | sort | uniq -c | sort -nr | head -10 > traffic_by_hour.csv
echo "Results saved to traffic_by_hour.csv"
;;
6)
awk -F\" '{print $6}' $LOGFILE | sort | uniq -c | sort -nr | head -10 > user_agents.csv
echo "Results saved to user_agents.csv"
;;
7)
awk '{print $11}' $LOGFILE | sort | uniq -c | sort -nr | head -10 > referrer_websites.csv
echo "Results saved to referrer_websites.csv"
;;
8)
awk '{print $1}' $LOGFILE | sort | uniq -c | sort -nr | head -10 > suspicious_ips.csv
echo "Results saved to suspicious_ips.csv"
;;
9)
awk '{print $7}' $LOGFILE | sort | uniq -c | sort -nr | head -10 > most_accessed_pages.csv
echo "Results saved to most_accessed_pages.csv"
;;
10)
grep 'Googlebot' $LOGFILE | awk '{print $4}' | cut -d: -f1 | sort | uniq -c | sort -nr > crawl_rate.csv
echo "Results saved to crawl_rate.csv"
;;
11)
grep ' 500 ' $LOGFILE | awk '{print $7}' | sort | uniq -c | sort -nr | head -10 > server_errors.csv
echo "Results saved to server_errors.csv"
;;
12)
awk '{print $1 "," $4 "," $7 "," $9}' $LOGFILE > seo_log_report.csv
echo "Results saved to seo_log_report.csv"
;;
13)
awk '{print $7}' $LOGFILE | sort | uniq -c | sort -nr | head -10 > bounce_rate.csv
echo "Results saved to bounce_rate.csv"
;;
14)
grep -E 'US|UK|DE' $LOGFILE | awk '{print $1, $7}' | sort | uniq -c | sort -nr > geo_traffic.csv
echo "Results saved to geo_traffic.csv"
;;
15)
echo "Exiting the script. Goodbye!"
exit 0
;;
*)
echo "Invalid option. Please run the script again."
exit 1
;;
esac
How to Run the Script
- Save the script as log_analysis.sh
- Give execution permission:
chmod +x log_analysis.sh
- Run the script:
./log_analysis.sh
- Follow the on-screen menu to analyze logs and export results to .csv files.
Note: Script must be placed in same folder as access.log file and you need to run it there!
Understanding log formats and analyzing key fields allows SEO professionals to:
✅ Track search engine crawling behavior
✅ Optimize website performance
✅ Identify errors before they impact rankings
✅ Improve user experience
Why I personaly love Linux env for analyzing server logs? 🤔
✅ Optimize crawling budget ✅ Identify performance bottlenecks ✅ Improve website architecture ✅ Detect and fix critical issues✅I feel like hacker while doing it