How to Purchase Property using Web Scraping

Using Python, Beautiful Soup and Pandas

Pritesh Patel
7 min readMay 2, 2021

Property prices in England are rising at their fastest pace since 2004, thanks to the government’s stamp duty holiday. I have been actively looking into the property market for the past few months and have found that, on average, properties don’t stay on the market for long due to high demand, and it is difficult to buy a house in this market. In comparison to apartments, there is a strong demand for houses. As a result, it is important that we remain at the front of the line whenever a new property is put on the market.

Due to a surge in demand, if we do not arrange a viewing within the first couple days of the property being published on the portal, there is a high chance we will not be accepted for viewing as multiple viewing booking already arranged. Many people check Rightmove several times a day to see what’s fresh in the market and what’s reduced in price.

I have been studying python programming for a few months and decided to put what I have learned so far into effect, so I created this web scraping project to assist me in making a buying decision using programming rather than reviewing the website multiple times a day.

Web Scraping

Web scraping is the process of extracting information from a website in an automated fashion using code. Information is collected using code and then exported into a format that is more useful to the user. For instance, you can collect a list of Amazon product names and prices. It is especially beneficial if the website doesn’t have an API or provides limited access to data. Web scraping is not a simple task because websites now come in a variety of shapes and forms, and as a result, web scraping code can vary in functionality and feature.

I created an email alert in this project using the web scarping process to provide the most recent listing from the Rightmove website. This project’s code can be easily adapted to other projects, but it’s better suited for real estate portal websites.

Property Portal (Rightmove)

Rightmove is a popular property website in the United Kingdom; it is the country’s largest online real estate portal. This website currently lists tens of thousands of properties. Each property listing includes the price, the number of bedrooms, the location, and the contact information for the estate agent who will arrange a viewing for the prospective buyer.

Getting Started

The page https://www.rightmove.co.uk/property-for-sale/find.html?locationIdentifier=REGION^1152 contains all available properties in Rugby, a small market town in eastern Warwickshire. In this article. we will explore how information from website can be obtained using web scraping. I will show how to use Python libraries Requests and Beautiful Soup to scarp data from the website. After that, we will examine other libraries, such as pandas, matplotlib and seaborn for data analysis.

Here’s an outline of the steps we’ll follow in this article:

  1. Download the webpage using Requests
  2. Parse the HTML source code using Beautiful Soup
  3. Extract property information and complie extracted information into Python lists and dictionaries
  4. Save the extracted information to CSV file.
  5. Data analysis using pandas,matplotlib and seaborn

At the end of this article, we will create a CSV file in the following format.

You can find the full code for this post here : https://jovian.ai/pritesh009/project-1-rightmove-with-functions-e2449

We gathered data in the following four segments by utilising the below primary variables and custom formulas. The primary advantage of this approach is that it allows for time savings on repetitive tasks.

The location for this project is Rugby Town, however, region codes for the other area can easily be found from the Rightmove website URL

1. Downloading the webpage using Requests

We can use the requests library to download the webpage. Here, we created a function page_content that collects HTML data as text and collects multiple pages of text data in form of 'pagedict' list.

page_content function takes the key parameters such as base url, number of properties, and properties per page.

Once we collect HTML source codes using requests library, html_parse function produces a beautiful soup document using the list provided by the page content function

2. Extracting and compiling the information into Python lists and dictionaries

We will then extract and compile information after downloading the data using requests and beautiful soup libraries. data_in_text collects the key tags and classes from the html document and creates a large property data dictionary.

We can collect all essential information in connection with the property using the above custom formula, and information will be stored in form of a dictionary.

3. Saving the extracted information to CSV file(s)

The write_csv the function generates a comma-separated text file, a table-like object that can be used in a pandas data frame or viewed in excel format.

By following the above steps you can scrap the information from the website, and you can use CSV file to view the information.

4. Data analysis using pandas, matplotlib and seaborn

Once data is collected, the process of analysis begins. We check the validity and quality of the data. I created a panda data frame using the CSV file.

The price information is one of the most important fields in our dataset, so we focus on cleaning it up so that we can analyze and extract meaningful information. For example, We delete all rows where prices are not available and converts price data type to integer, this helps us to perform arithmetic expressions such as max, min, average etc.

The primary goal of data analysis is to give meaning to data so that it can be used in decision making. In this segment, we used simple statistical methods to summarise and understand data such as mean, range, quartiles, and so on.

The range, median price, and interquartile range can all be seen in the box plot below. We only have one outliner in our dataset, but we will keep it to see if it makes a significant difference in the average and median prices

As we can see from the below data table that one outliner in our data set does not make any significant difference between mean and median price.

5. Send an email with an attachment

Finally, we can use the clean data and apply a filter to find the latest property that came to market in the last couple of days and save that data as CSV file.

Then, using the SMTPLIB module, we generate the following code to send an email with an attachment. If you are transmitting the code to a larger audience, you can also use the OS module to set an environment variable on your personal computer to secure your email password.

Here is an image of what an email will look like when sent.

Summary

Using this code, I was able to schedule several viewings in the last few weeks; however, there are some other applications for this code that will help you make informed decisions and save time when researching the property market.

You can further create a conditional email alert based on the following criteria:

  • If the property price is less than the average price for a comparable sized property.
  • If the property price has been reduced within the last week.
  • If the property location is in a specific school catchment area.

This short article demonstrates the web scraping process; as previously mentioned, you can use the code within this article or refer to the full code here: https://jovian.ai/pritesh009/project-1-rightmove-with-functions-e2449

Thank you for taking the time to read. This is my first technical blog, so if you have any comments or if anything is unclear, please contact me.

References

--

--