A Novice’s Information to HTTP Python Requests


Everything is accessible on the Web through requests. If you need information from a web page in your Python application, you need a web request. In this article, we’ll dig into Python requests. We’ll look at how a web request is structured and how to make a Python request. By the end, you’ll be able to use the Python requests library, which makes the whole process easier.

An Introduction to HTTP Requests

To exchange data on the Web, we firstly need a communication protocol. The protocol used when we browse the Web is the Hypertext Transfer Protocol, or HTTP. HTTP uses TCP as a transport protocol, because it needs reliable transport, and only TCP can guarantee that.

Let’s say there’s a resource we need — such an HTML page, on a web server located somewhere in the world. We want to access this resource or, in other words, we want to look at that page in our web browser. The first thing we have to do is make an HTTP request. HTTP is a client–server protocol, which means that the requests are initiated by the client.

After the server receives the requests, it processes them and returns an appropriate response.

The server might reply in different ways. It might send the resource we requested, or reply with status codes if something doesn’t go as expected.

In every communication protocol, the information needs to be in specific fields. That’s because both the client and the server should know how to interpret the request or response. In the next sections, we’ll look at how an HTTP request and an HTTP response are built. We’ll also discuss the role of the most important fields.

The HTTP request

One of the most important design features of HTTP is that it’s human readable. This means that, when we look at an HTTP request, we can easily read everything, even if there’s a lot of complexity under the hood. Another feature of HTTP is that it is stateless. This means that there’s no link between two requests served one after the other. The HTTP protocol doesn’t remember anything of the previous request. This implies that each request must contain everything that the server needs to carry out the request.

A valid HTTP request must contain the following elements:

  • an HTTP method — such as GET or POST
  • the version of the HTTP protocol
  • the path of the resource to fetch

Then, we can also add some optional headers that specify additional information about the sender or the message. One example of a common HTTP request header is the User-Agent or the natural language the client prefers. Both of those optional headers give information about the client that’s making the request.

This is an example of an HTTP message, and we can clearly understand all the fields specified:

~~~http
GET / HTTP/1.1
Host: www.google.com
Accept-Language: en-GB,en;q=0.5
~~~

The first line specifies the request type and the version of the HTTP protocol. Then we specify the Host and the language accepted by the client that’s sending the request. Usually, the messages are much longer, but this gives a hint of what they look like.

The HTTP response

Now that we have an idea of what an HTTP request looks like, we can go on and see the HTTP response.

An HTTP response usually contains the following elements:

  • the version of the HTTP protocol
  • a status code, with a descriptive short-message
  • a list of HTTP headers
  • a message body containing the requested resource

Now that we’ve introduced the basic elements you need, it’s worth making a summary before taking the next step. It should be clear by now that, whenever a client wants to communicate with an HTTP server, it must create and send an HTTP request. Then, when the server receives it, it creates and sends an HTTP response.

We’re finally ready to introduce the Python requests library.

The Python requests Library

The Python requests library allows you to send Python HTTP requests — from basic to complicated ones. The Python requests library abstracts the complexities of making complex Python requests, providing an easy-to-use interface. In the next sections, we’ll see how to create easy Python requests and interpret the response. We’ll also see some of the features provided by the Python requests library.

Installing Python requests

First, we need to install the Python requests library. Let’s install it using pip:

$ pip install requests

Once the Python requests library is installed correctly, we can start using it.

Our first GET request with Python requests

The first thing we have to do is to create a Python file. In this example, we call it web.py. Inside this source file, insert this code:

import requests

URL = "https://www.google.com"
resp = requests.get(URL)

print(resp)

This program makes a GET request for Google. If we run this program, we’ll probably get this output:

$ python web.py
<Response [200]>

So, what does this mean?

We talked about the status code earlier. This output is telling us that our request has been received, understood and processed successfully. There are other codes as well, and we can list a few of the most common:

  • 301 Moved Permanently. This is a redirection message. The URL of the resource we were looking for has been moved. The new URL comes with the response.

  • 401 Unauthorized. This indicates a client error response. In this case, the server is telling us that we must authenticate before proceeding with the request.

  • 404 Not found. This indicates a client error response too. In particular, this means that the server can’t find the resource we were looking for.

What if we want to conditionally check the status, and provide different actions based on the status code? Well, we can easily do this:

import requests

URL = "https://www.google.com/blah"
resp = requests.get(URL)

if resp.status_code == 200:
  print("Okay, all good!")
elif resp.status_code == 301:
  print("Ops, the resource has been moved!")
elif resp.status_code == 404:
  print("Oh no, the resource wasn't found!")
else:
  print(resp.status_code)

If we run the script now, we’ll get something different. Have a try and see what we get. 😉

If we also want the descriptive short message that comes with each status code, we can use resp.reason. In the case of a 200 status code, we’ll simply get OK.

Inspecting the response of the Python request

At this point, we know how to make a basic Python request. After the request, we want the response, right?

In the previous section, we saw how to get the status code of the response. Now, we want to read the body of the response, which is the actual resource we requested. To do this, we need to use resp.content. Let’s say that we’re looking for the Google home page.

This is what we get when we run the script:

b'<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content="text/html; [...]

I’ve added [...] above because the resource we get — which is a text/html document — is too long to be printed. By how much? We can use len(resp.content) to get this information. In the case above, it was 13931 bytes — definitely too much to be printed here!

Making use of APIs

One of the reasons why the Python requests library became so popular is because it makes interacting with APIs very easy. For this example, we’ll use a simple API for predicting a person’s age, given their name. This API is called Agify.

This is the code for the example:

import requests
import json

URL = "https://api.agify.io/?name=Marcus"
resp = requests.get(URL)

if resp.status_code == 200:
  encoded = resp.json()
  print(encoded['age'])
else:
  print(resp.status_code)

In this case, we want to know the age of a person whose name is Marcus. Once we have the response, if the status code is 200, we interpret the result in JSON using resp.json(). At this point, we have a dictionary-like object, and we can print the estimated age.

The estimated age of Marcus is 41 years old.

HTTP headers provide additional information to both parties of an HTTP conversation. In the following example, we’ll see how we can change the headers of an HTTP GET request. In particular, we’ll change the User-Agent and the Accept-Language headers. The User-Agent tells the server some information about the application, the operating system and the vendor of the requesting agent. The Accept-Language header communicates which languages the client is able to understand.

This is our simple code snippet:

import requests

URL = "https://www.google.com"
custom_headers = {'Accept-Language': 'fr-CH, fr;q=0.9, en;q=0.8, de;q=0.7, *;q=0.5', 'User-Agent': 'Mozilla/5.0 (Linux; Android 12; SM-S906N Build/QP1A.190711.020; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/80.0.3987.119 Mobile Safari/537.36'}
resp = requests.get(URL, headers=custom_headers)

if resp.status_code == 200:
  
  print(resp.content[:100])  
else:
  print(resp.status_code)

If everything goes right, you should get something like this:

$ <!doctype html><html lang="fr"><head><meta charset="UTF-8"><meta content="width=device-width,mini [...]

In this example, we’ve changed the User-Agent, pretending that our request comes from Mozilla Firefox. We’re also saying that our operating system is Android 12 and that our device is a Samsung Galaxy S22.

Since we’ve printed the first 100 characters of the response above, we can see that the HTML page we’ve received is in French.

Conclusion

In this article, we talked about the HTTP protocol, with a brief theoretical introduction. Then we looked at the Python requests library. We saw how to write basic Python HTTP requests and how to customize them according to our needs.

I hope you’ll find this library and this article useful for your projects.

Related reading:



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *