Extracting WordPress Blog Post Title URLs (feat. Python & WordPress)
If you're running a WordPress blog, you've probably thought about extracting the URLs of the post titles you've written.
I'm also curious about how much I've written, and when I make changes to the post, such as permalinks, I have to re-register the post address through the URL checker in places like Google Search Console.
In this post, we'll use a Python script to create a Automatically extract the title and URL of a postby using Save as an Excel fileI'm going to show you how to do it, so take your time and follow along below.
Getting acclimated to installing and using VS code
First, we'll write a Python script and install VS code to run it. Installing VS CODE - Windows Follow along with the post.
If you're new to using VS code, we recommend reading the post below to get you acclimated and give it a try.
Python source code and running
Python source code modifications
Simply copy and paste the code below into your VS code and run it. However, you only need to replace the address (in bold) at the bottom with your own.
Using the # example (secondlife.lol)
api_endpoint = 'https://secondlife.lol/wp-json/wp/v2′
import requests
import json
from openpyxl import Workbook
import time
def get_all_wordpress_posts(api_endpoint, auth=None):
headers = {'Accept': 'application/json'}
if auth:
headers['Authorization'] = auth
posts = []
page = 1
per_page = 10
while True:
try:
response = requests.get(f"{api_endpoint}/posts?page={page}&per_page={per_page}", headers=headers)
response.raise_for_status()
page_posts = json.loads(response.content)
if not page_posts:
print(f"All pages have been processed, totaling {len(posts)} posts.")
break
posts.extend(page_posts)
page += 1
print(f"Finished processing page {page-1}. Total {len(posts)} posts fetched.")
time.sleep(1) # wait 1 second between each request
except requests.exceptions.HTTPError as e:
"rest_post_invalid_page_number" in response.text. if response.status_code == 400 and "rest_post_invalid_page_number" in response.text:
print(f"Page {page} is invalid, you have fetched a total of {len(posts)} posts.")
else:
print(f"HTTP error occurred: {e}")
print(f"Response content: {response.text}")
break
except Exception as e:
print(f"Exception thrown: {e}")
break
return posts
def save_to_excel(posts, filename="post_info.xlsx"):
wb = Workbook()
ws = wb.active
ws.append(["Title", "URL"]) add # header
for post in posts:
ws.append([post['title']['rendered'], post['link']])
wb.save(filename)
print(f"{filename} Successfully saved {len(posts)} post information to file.")
Using the # example (secondlife.lol)
api_endpoint = 'https://secondlife.lol/wp-json/wp/v2'
auth = 'Basic '
print("Starting to fetch posts...")
posts = get_all_wordpress_posts(api_endpoint, auth)
print(f"A total of {len(posts)} posts were fetched.")
if posts:
save_to_excel(posts)
else:
print("No posts were imported.")
WordPress blog post title URL extraction result
In the code above, when executed, it iterates over the pages separated by pagination and fetches all the post information. And below shows the sequential progression. And finally, a post_info.xlsx file is generated.



One thing to note is that the post info file (post_info.xlsx) is generated in the folder where the source code file (post_list_url_r1.2.py) is located, so you shouldn't inertially look for the generated result file in a download folder or something, like you would if you were downloading something from the web.
Organize
In this post, we showed you how to use Python to interact with the WordPress API, extract WordPress blog post titles, extract URLs, and save them to an excel file.
With these steps, you should have a small arsenal of tools to manage your blog.
In my next post, I'll build on this post and share how to automatically register your blog post URLs in places like Google Search Console (I tried to do this in this post, but it doesn't seem to be an easy task).
We wish you smarter blogging with secondlife.lol.
For reference, Export All URLsplugin, so you might want to use that.
#Interpreting the source code
Source code and key comments
import requests
import json
from openpyxl import Workbook
import time
def get_all_wordpress_posts(api_endpoint, auth=None):
"""Function to get all WordPress posts by handling pagination"""
Set the required headers for the # API request
headers = {'Accept': 'application/json'}
if auth:
headers['Authorization'] = auth
posts = [] # List to store all posts in
page = 1 # initial page number
per_page = 10 # Set the number of posts to fetch at once to 10
while True:
try:
Send a request to the # API endpoint to get the posts per page
response = requests.get(f"{api_endpoint}/posts?page={page}&per_page={per_page}", headers=headers)
response.raise_for_status() # Throws an exception if the request is not successful
Convert # response data to JSON format
page_posts = json.loads(response.content)
if not page_posts:
break # End the loop if there are no more posts
# Add the fetched posts to the list
posts.extend(page_posts)
page += 1 # Go to next page
print(f"Processing page {page-1} complete. Total of {len(posts)} posts fetched.")
time.sleep(1) # Wait 1 second between each request (to reduce the load on the server)
except requests.exceptions.HTTPError as e:
# What to do when an HTTP error occurs
print(f"HTTP error encountered: {e}")
print(f"Response content: {response.text}")
break
except Exception as e:
# What to do when other exceptions occur
print(f"Exception thrown: {e}")
break
return posts # Return all posts
def save_to_excel(posts, filename="post_info.xlsx"):
"""Function to save post titles and URLs to an excel file.""""
wb = Workbook() # Create a new excel workbook
ws = wb.active # Select active worksheet
ws.append(["Title", "URL"]) # Add header to first row
for post in posts:
# Add the title and URL of each post to the excel file
ws.append([post['title']['rendered'], post['link']])
wb.save(filename) # Save the excel file
print(f"{filename} Successfully saved {len(posts)} post information to file.")
Using the # example (secondlife.lol)
api_endpoint = 'https://secondlife.lol/wp-json/wp/v2' # WordPress API endpoint
auth = 'Basic ' # Add authentication info if needed (Base64 encoded username:password)
print("Start fetching posts...")
posts = get_all_wordpress_posts(api_endpoint, auth) # Call the post fetch function
print(f"A total of {len(posts)} posts were fetched.")
if posts:
save_to_excel(posts) Save # posts, if any, to an excel file
else:
print("No posts were imported.") print message if no # posts were importedDetailed annotation description
- Importing modules
requests: Module for sending HTTP requests.jsonModules for processing JSON data.openpyxl: Modules for working with excel files.timeModules for using time-related functions.
- get_all_wordpress_posts function
api_endpointandauthas an argument to fetch all posts via the WordPress API.
- Setting headers
headers = {'Accept': 'application/json'}: Set headers to receive responses in JSON format.if auth:: Add to header if credentials are present.
- Set up lists and pages to store post data
posts = []: an empty list to store posts in.page = 1: Initial page number.per_page = 10: Number of posts to fetch at once.
- API requests via looping statements
while True:: Fetch posts by page in an infinite loop.response = requests.get(...): Send a request to the API endpoint.response.raise_for_status(): Raises an exception if the request fails.page_posts = json.loads(response.content): Convert response data to JSON format.if not page_posts:: End loop when there are no more posts.posts.extend(page_posts): Add the imported posts to the list.page += 1: Go to next page.time.sleep(1): Wait 1 second between each request.
- Exception handling
except requests.exceptions.HTTPError as e:: Handling when an HTTP error occurs.except Exception as e:: Handling other exceptions when they occur.
- Returning post data
return posts: Returns all post data.
- save_to_excel function
postsandfilenameas an argument to save the post data to an excel file.
- Create an Excel file and save your data
wb = Workbook(): Create a new Excel workbook.ws = wb.active: Select Active Worksheet.ws.append(["Title", "URL"]): Add a header to the first row.for post in posts:: Add the title and URL of each post to an excel file.wb.save(filename): Save Excel File.
- Main code
api_endpoint = 'https://secondlife.lol/wp-json/wp/v2': Set up a WordPress API endpoint.auth = 'Basic ': Add authentication information if needed.print("Start fetching posts..."): Print a welcome message.posts = get_all_wordpress_posts(api_endpoint, auth): Calling the post fetch function.print(f"A total of {len(posts)} posts were fetched."): Output the number of posts.if posts: save_to_excel(posts): Save to excel file if there are posts.else: print("No posts were imported."): Output message if no post exists.






