Extracting WordPress Blog Post Title URLs (feat. Python & WordPress)

If you're running a WordPress blog, you've probably thought about extracting the URLs of the post titles you've written.

I'm also curious about how much I've written, and when I make changes to the post, such as permalinks, I have to re-register the post address through the URL checker in places like Google Search Console.

In this post, we'll use a Python script to create a Automatically extract the title and URL of a postby using Save as an Excel fileI'm going to show you how to do it, so take your time and follow along below.

Getting acclimated to installing and using VS code

First, we'll write a Python script and install VS code to run it. Installing VS CODE - Windows Follow along with the post.

If you're new to using VS code, we recommend reading the post below to get you acclimated and give it a try.

Python source code and running

Python source code modifications

Simply copy and paste the code below into your VS code and run it. However, you only need to replace the address (in bold) at the bottom with your own.

Using the # example (secondlife.lol)
api_endpoint = 'https://secondlife.lol/wp-json/wp/v2′

import requests
import json
from openpyxl import Workbook
import time

def get_all_wordpress_posts(api_endpoint, auth=None):
    headers = {'Accept': 'application/json'}
    if auth:
        headers['Authorization'] = auth

    posts = []
    page = 1
    per_page = 10

    while True:
        try:
            response = requests.get(f"{api_endpoint}/posts?page={page}&per_page={per_page}", headers=headers)
            response.raise_for_status()

            page_posts = json.loads(response.content)
            if not page_posts:
                print(f"All pages have been processed, totaling {len(posts)} posts.")
                break

            posts.extend(page_posts)
            page += 1
            print(f"Finished processing page {page-1}. Total {len(posts)} posts fetched.")

            time.sleep(1) # wait 1 second between each request

        except requests.exceptions.HTTPError as e:
            "rest_post_invalid_page_number" in response.text. if response.status_code == 400 and "rest_post_invalid_page_number" in response.text:
                print(f"Page {page} is invalid, you have fetched a total of {len(posts)} posts.")
            else:
                print(f"HTTP error occurred: {e}")
                print(f"Response content: {response.text}")
            break
        except Exception as e:
            print(f"Exception thrown: {e}")
            break

    return posts

def save_to_excel(posts, filename="post_info.xlsx"):
    wb = Workbook()
    ws = wb.active
    ws.append(["Title", "URL"]) add # header

    for post in posts:
        ws.append([post['title']['rendered'], post['link']])

    wb.save(filename)
    print(f"{filename} Successfully saved {len(posts)} post information to file.")

Using the # example (secondlife.lol)
api_endpoint = 'https://secondlife.lol/wp-json/wp/v2'
auth = 'Basic '

print("Starting to fetch posts...")
posts = get_all_wordpress_posts(api_endpoint, auth)
print(f"A total of {len(posts)} posts were fetched.")

if posts:
    save_to_excel(posts)
else:
    print("No posts were imported.")

WordPress blog post title URL extraction result

In the code above, when executed, it iterates over the pages separated by pagination and fetches all the post information. And below shows the sequential progression. And finally, a post_info.xlsx file is generated.

워드프레스 블로그 포스트 제목 URL 추출 과정과 결과
워드프레스 블로그 포스트 제목 URL 추출 결과
워드프레스 블로그 포스트 제목 URL 추출 결과

One thing to note is that the post info file (post_info.xlsx) is generated in the folder where the source code file (post_list_url_r1.2.py) is located, so you shouldn't inertially look for the generated result file in a download folder or something, like you would if you were downloading something from the web.

Organize

In this post, we showed you how to use Python to interact with the WordPress API, extract WordPress blog post titles, extract URLs, and save them to an excel file.

With these steps, you should have a small arsenal of tools to manage your blog.

In my next post, I'll build on this post and share how to automatically register your blog post URLs in places like Google Search Console (I tried to do this in this post, but it doesn't seem to be an easy task).

We wish you smarter blogging with secondlife.lol.

For reference, Export All URLsplugin, so you might want to use that.

#Interpreting the source code

Source code and key comments

import requests
import json
from openpyxl import Workbook
import time

def get_all_wordpress_posts(api_endpoint, auth=None):
    """Function to get all WordPress posts by handling pagination"""

    Set the required headers for the # API request
    headers = {'Accept': 'application/json'}
    if auth:
        headers['Authorization'] = auth

    posts = [] # List to store all posts in
    page = 1 # initial page number
    per_page = 10 # Set the number of posts to fetch at once to 10

    while True:
        try:
            Send a request to the # API endpoint to get the posts per page
            response = requests.get(f"{api_endpoint}/posts?page={page}&per_page={per_page}", headers=headers)
            response.raise_for_status() # Throws an exception if the request is not successful

            Convert # response data to JSON format
            page_posts = json.loads(response.content)
            if not page_posts:
                break # End the loop if there are no more posts

            # Add the fetched posts to the list
            posts.extend(page_posts)
            page += 1 # Go to next page
            print(f"Processing page {page-1} complete. Total of {len(posts)} posts fetched.")

            time.sleep(1) # Wait 1 second between each request (to reduce the load on the server)

        except requests.exceptions.HTTPError as e:
            # What to do when an HTTP error occurs
            print(f"HTTP error encountered: {e}")
            print(f"Response content: {response.text}")
            break
        except Exception as e:
            # What to do when other exceptions occur
            print(f"Exception thrown: {e}")
            break

    return posts # Return all posts

def save_to_excel(posts, filename="post_info.xlsx"):
    """Function to save post titles and URLs to an excel file.""""

    wb = Workbook() # Create a new excel workbook
    ws = wb.active # Select active worksheet
    ws.append(["Title", "URL"]) # Add header to first row

    for post in posts:
        # Add the title and URL of each post to the excel file
        ws.append([post['title']['rendered'], post['link']])

    wb.save(filename) # Save the excel file
    print(f"{filename} Successfully saved {len(posts)} post information to file.")

Using the # example (secondlife.lol)
api_endpoint = 'https://secondlife.lol/wp-json/wp/v2' # WordPress API endpoint
auth = 'Basic ' # Add authentication info if needed (Base64 encoded username:password)

print("Start fetching posts...")
posts = get_all_wordpress_posts(api_endpoint, auth) # Call the post fetch function
print(f"A total of {len(posts)} posts were fetched.")

if posts:
    save_to_excel(posts) Save # posts, if any, to an excel file
else:
    print("No posts were imported.") print message if no # posts were imported

Detailed annotation description

  1. Importing modules
    • requests: Module for sending HTTP requests.
    • jsonModules for processing JSON data.
    • openpyxl: Modules for working with excel files.
    • timeModules for using time-related functions.
  2. get_all_wordpress_posts function
    • api_endpointand authas an argument to fetch all posts via the WordPress API.
  3. Setting headers
    • headers = {'Accept': 'application/json'}: Set headers to receive responses in JSON format.
    • if auth:: Add to header if credentials are present.
  4. Set up lists and pages to store post data
    • posts = []: an empty list to store posts in.
    • page = 1: Initial page number.
    • per_page = 10: Number of posts to fetch at once.
  5. API requests via looping statements
    • while True:: Fetch posts by page in an infinite loop.
    • response = requests.get(...): Send a request to the API endpoint.
    • response.raise_for_status(): Raises an exception if the request fails.
    • page_posts = json.loads(response.content): Convert response data to JSON format.
    • if not page_posts:: End loop when there are no more posts.
    • posts.extend(page_posts): Add the imported posts to the list.
    • page += 1: Go to next page.
    • time.sleep(1): Wait 1 second between each request.
  6. Exception handling
    • except requests.exceptions.HTTPError as e:: Handling when an HTTP error occurs.
    • except Exception as e:: Handling other exceptions when they occur.
  7. Returning post data
    • return posts: Returns all post data.
  8. save_to_excel function
    • postsand filenameas an argument to save the post data to an excel file.
  9. Create an Excel file and save your data
    • wb = Workbook(): Create a new Excel workbook.
    • ws = wb.active: Select Active Worksheet.
    • ws.append(["Title", "URL"]): Add a header to the first row.
    • for post in posts:: Add the title and URL of each post to an excel file.
    • wb.save(filename): Save Excel File.
  10. Main code
    • api_endpoint = 'https://secondlife.lol/wp-json/wp/v2': Set up a WordPress API endpoint.
    • auth = 'Basic ': Add authentication information if needed.
    • print("Start fetching posts..."): Print a welcome message.
    • posts = get_all_wordpress_posts(api_endpoint, auth): Calling the post fetch function.
    • print(f"A total of {len(posts)} posts were fetched."): Output the number of posts.
    • if posts: save_to_excel(posts): Save to excel file if there are posts.
    • else: print("No posts were imported."): Output message if no post exists.

Similar Posts