A friend recently reached out to me, having been asked to build a database with tens of thousands of lines of data points.
Luckily, this should be quite feasible with some of the following tools.
Let's start with getting a good installation of what one will need.
First, I highly recommend using Anaconda. It provides Python as well as most of the essential packages for data analysis. As of the day of publishing, its download page can be found here.
Next one will need a text editor. While I personally use Notepad ++ due to its simplicity (and because it was advised to be back in "Learn Python the Hard Way" and I stuck with it, another very popular one is Sublime.
For my friends problem, I believe Beautiful Soup should be more than enough to gather the data. I recall watching this tutorial a while back and finding it very intuitive and thorough. The creator starts with the assumption of no prior experience, and walks viewers from installing all the necessary software all the way through the webscraping process and explaining how beautiful soup uses HTML markups to extract data. I believe he also explains For-Loops, which will be necessary for this.
Generally I have been able to avoid scraping by always using APIs (such as the Yahoo Finance one). But sometimes tools like these are needed.
Best of luck!
Luckily, this should be quite feasible with some of the following tools.
Let's start with getting a good installation of what one will need.
First, I highly recommend using Anaconda. It provides Python as well as most of the essential packages for data analysis. As of the day of publishing, its download page can be found here.
Next one will need a text editor. While I personally use Notepad ++ due to its simplicity (and because it was advised to be back in "Learn Python the Hard Way" and I stuck with it, another very popular one is Sublime.
For my friends problem, I believe Beautiful Soup should be more than enough to gather the data. I recall watching this tutorial a while back and finding it very intuitive and thorough. The creator starts with the assumption of no prior experience, and walks viewers from installing all the necessary software all the way through the webscraping process and explaining how beautiful soup uses HTML markups to extract data. I believe he also explains For-Loops, which will be necessary for this.
Generally I have been able to avoid scraping by always using APIs (such as the Yahoo Finance one). But sometimes tools like these are needed.
Best of luck!

