hello everyone welcome John here and today we're going to cover how to web
great sites that require login using requests and requests session using the
inspect element tool in our browser we can see where the login request is
actually sent and we can mimic that in our program and this and the session
part allows us to stay alive within that and access all the pages that are behind
the login there are a few things we need to do before we write our code however
and we need to find out the login URL what parameters are sent with that post
request and of course we need the login credentials although this although in
this example I will share the login information with you because we're using
a dummy site I also show you a way to separate out your credentials at the end
to make it a bit safer and better when you're sharing your script or uploading
to github or whatever so this is a site we're going to use and it's at this URL
and I'll put a link to that in the description
as you can see we've got a simple login form with a username and password
required so if we log into this now using the information given to us here
[Applause] we'll see that when we log in correctly
we go look you are logged in and we get a secure area so this is what we want to
get to with us with our Python program and then and then scrape the pages
within this although this is demo so there's no real meaningful information
here okay so if we log out now so the way that we find out what's going on
with the requests is by using the inspect or inspect element poor part of
the brother your web browser and the tab we're most interested in is the network
one so as you can see here if we click the login button with no credentials
we'll get a load of requests pop-up and this is what we just did what we just
sent to the server so we can see here one of them ones is called login and
it's got a or thin Takai into it now this looks like a post request to me so
if we could click on it here that was a get request so we want the one above
which is a post request so a post request is a request sent to the server
from the web browser and a get request is basically the information coming back
what we need to find out is the URL that is being posted to with a
username and password and any other information that goes along with that we
can see right away here that the request URL is this one so let's copy and paste
that over here for safekeeping because that's where we're going to need to send
our post request from our script so if we now clear this up and we clear that
up and if we click Preserve log we'll be able to see everything come in so if we
use exactly the same super secret password and login I type that wrong
let's clear that again I'll get the password by this time great so we logged
in correctly now we can what we can do is we can actually see on our request
here that we was a post request and somewhere down here it should give us a
response now here's the response didn't load okay here we go here's our form
data and this is what was sent along with our request to the URL so we need
to make sure these are this we need to make sure that we use the correct
matching information here now sometimes you might find there might be a bit more
information down here it might say have other have the parameters with it and
you need to make sure that those go along with the request as well but we
can see here there's only a username and password so that's all that we need from
logging in here as well we can see that we've got directed back to secure and
this should be our get request here that we got sent back so we need this URL as
well just put that in here okay great so I'm going to close out the browser now
and we'll get onto our editor and start writing our code so the first thing we
need to do as always is import requests and we need to set our URL so our let's
call this login URL is equal to this is where the information that we posted to
not the URL that was actually went to to get the login form
and then let's call this one our secure URL forbear there we go so that's
posters in and this is where this is the web this is the URL that we want to get
to once we have logged in okay so now we need to work on our post request and we
need to send the username and the password
along with that to get authenticated with the server now to do that we need
to send some kind of payload and because we have two parameters we need to make
that into a dictionary so we'll do payload is equal to and create a Python
dictionary and the first one was username which is what we saw in our
post request in the browser and that was Tom Smith and then the password was this
password just like that okay so now we've created our payload to send along
with it if there were any other parameters that needed to go with
request they would also need to go in here and match what we looked at on the
only inspect element Network toggle the browser so the next thing we need to do
is let's ignore session for now and let's just see if we can get
authenticated with the server so if we do R is equal to requests . post and
then we need the login URL that we set and then data is equal to the payload so
what this is doing is just going to use the requests to post this information to
this URL and the payload is what we created so if we print out our dot just
print the text hopefully what we should get back is the secure page there we go
secure area so this shows that we did actually manage to log in to the secure
area okay so that's great so now we think that
perhaps okay so we've authenticated with the service so if we were to try and
navigate to a different page within that login area we could just access that as
is but if if we try to do that say r2 is equal to
Quest's get and let's try and get the same page back secure once I call that
secure you are so this is exactly the same page but with this one when we send
this post request we're actually getting the information back and within that
information was a redirect which is the which was this page here the secure area
so if we try and do that if we try and do this post request and then also get
the same page back again and this could be a different page but this is the only
one that's there then we should the we should hopefully get this information
back again but we won't we'll go and it will send us back to the login page
because we are not authenticated so if we trim the text out from that request
which is going here we should get here that we're back at the login page so
what this is done is that we have authenticated with the server but then
because we haven't had we don't have our session we're not staying authenticated
so we're not going to get anything so what do we need to do well we need to
use request session so I'm going to remove these and we're going to keep
these for now and also to make it a bit easier to see what's going to get going
on I'm going to use import beautifulsoup as well so we can make the output a bit
nicer so we can see everything ok so the same we need to keep the same part
payload and we're going to use context manager in this case now context manager
is very useful because it will allow us to stay connected and stay logged in as
long as we remain within our with statement and we come out of that will
log back out again it's always good python practice to use a context manager
when you're opening files or creating a session like this it means you don't
stay connected to or logged into something so let's do with requests dot
session with the double brackets there and we'll do that as s just to give it a
name and we will then gonna do s dot post and exactly what we did before with
our sorry log in URL and then a data it's our payload so this is basically
just opening it and calling it s which is why it's s dot post here because
that's what we've used and then we're going to let's do print sorry let's do
let's create our soup variable and we'll do beautiful soup and actually I'm
getting a bit ahead of myself here let's just see what we get back if we do
response and then let's print ah so we should get ah area back
response 200 because we've got the status code and we do the text we should
get our secure area back which we do great
so that proves that we've logged into there okay so we can get rid of this and
let's try and load that page up again as we did before but when we did it without
the session we were not logged in so we can get the page so now let's do our is
equal to request dot get and then let's do the secure URL so send a request
directly to the at the URL which will only get a response back if we are still
logged in and in this this case I am gonna use going to create a suit
variable so it's just easier to see and beautiful suit capital and let's do our
content and we use the HTML parser like that and then that's print suit dot and
we'll use prettify so it's a bit easier to see it's clear I think that okay so
with this we'll keep our session open so when we post our login information which
we've created here to the authenticate URL which came from the inspect element
on the browser that we saw we should then stay connected with our session
which means when we request the secure URL we should get the information back
from that page okay well we didn't so we've done something wrong okay so I can
see straight away what we've done wrong here is that we haven't used our session
we've used requests to get as opposed to our session variable so if we change
this to s we'll get in there we go welcome to the secure area okay so what
we've managed to do is we've logged in to the website using the post and using
our session as a context manager and then we've got our response using our
session get to the secure page URL we've got the response back so this could be
anything you could use logging into whatever website and then going directly
to another URL that you can only access when you're logged in and getting that
information so I want to show now is why I mention the beginning of the video
where you can hide your user name and password from your main script which is
always a good practice so what we're going to do is we're going to create a
new file a new PI file and within that we're going to have username is equal to
Tom Smith and our password equal to the password
like this and we're going to save that as another pie file I'm gonna call that
creds dot py and it's going to be in the same folder the same directory as our
main script and here what we can do is we can actually import that PI file into
our main pipe into our main program and by doing that what we can do is we can
call those variables so we can then call creds dot username and also our creds
dot password and what that's going to do is it's going to go to this file and get
that information so you could then ignore this from your get upload and
just upload this which means no one can see your username and password let's
just check that works and there we go straight back to the secure area so
that's it guys we managed to log into a website using requests and session to
keep it alive and then access pages only available behind that login I've also
shown you away how you can hide your credentials from your main file so make
sure you get into that habit just by