The Beautiful Soup

Written on August 8, 2017

[ python webscraping wwe ]

I use this package all the time, but only figure it out as I go along…and I often forget what I did the last time. Here are a few refresher commands:

import requests
import bs4 

page = requests.get(URL)
page = bs4.BeautifulSoup(page,'lxml')
page.select('div')   #  get list of all div tags and their contents
page.select('input')  # get list of all input tags
page.select('input[type="hidden"]')   #  list of all hidden inputs
page.select('input[type="button"]')   # list of all buttons
page.select('div')[0].get_text()   #  get text on webpage that has to do w/ the first div tag
page.select('div')[1].attrs   #  get attributes of the second div tag
page.select('span')[5].get('id')   # if id is attribute of the 6th span tag, then this will get that id