Video Thumbnail 22:35
Working With APIs in Python - Pagination and Data Extraction
142.9K
3.9K
2021-06-20
➡ JOIN MY MAILING LIST https://johnwr.com ➡ COMMUNITY https://discord.gg/C4J2uckpbR ➡ PROXIES https://proxyscrape.com/?ref=jhnwr ➡ WEB SCRAPING API https://hubs.li/Q043T88w0 ➡ HOSTING https://m.do.co/c/c7c90f161ff6 If you are new, welcome. I'm John, a self taught Python developer and content creator, working at Zyte. I specialize in data extraction and automation. If you like programming and web content as much as I do, you can subscribe for weekly content. All views in this video are my o...
Subtitles

in this video i'm going to show you the basics of working with an

api in python we're going to run through a code demo

of how you would make a request to an api and how we can actually deal with

the data that comes back so the api that we're going to be

working with is this one it's the rick and morty api and there's a reason why

i've chosen this one it's open which means you don't have to

authenticate so we can worry about authentication at another time

but also it has a good rest api and even better documentation because the

first thing i would always say when you're trying to work with a specific

api is you need to go through their

documentation so you can work out what information is where and how you're

going to get it out so if we look at this we can see we have

a base url for our rest ui our rest api now this is important

we're going to take this as well and we can see that if you hit this end point

you actually get this information back which tells you that you can

query for characters locations and episodes what i'm going to do is i'm

actually going to copy this i'm going to paste it into my browser

and we're going to see that we get this information back here

is what we saw right here so what we can do is we can actually just

interrogate this as if we were working with a website although we're actually

looking at and asking the api for bits of information

so i'm just going to make this bigger right now so we can actually see what

the main response looks like so before we go ahead and actually start

working with the code i'm going to run through

this response here and explain what all these bits are

now it's really important because when you actually look through the api it

will give you a response and a schema of what the

actual data is coming back so we can see that we have two main

things here we have the results and this info thing at the

top so the info bit actually tells us how

many results there are for for the endpoint

that we've queried and how many pages there are now this is

going to be really important information as we go forward and it also gives us

the url to the next page you don't always get this but there

will be some kind of total number of results and also number

of pages or some do it by you do the next load of

results so it would be the first would be

one to a hundred and then the second would be 100 onwards etcetera but in

this case we get a total count and a number of pages

so we can collapse that and now we can see that we get a list of results

so if i collapse that we can see that my uh my browser is telling me there are 20

items and we can see that we have all 20 here

and each one has 12 specific bits of information in

it an id name etc etc so what i'm going to

do is i'm going to say that we want to compile

our own list for whatever reason of characters from this tv series we're

going to take the name and then maybe the list of

episodes well that might be quite long but we'll

see when we get there so what we want to do now is we're going to come out here

and we need to go back to the documentation and we can see that we

have what i just explained here it tells you

that the information given etc etc and then

the character endpoint which is what we just looked at

so it says you can access different pages with the page parameters so that's

really important and we saw that in the next link when we looked at the

information up here under the info tab we can see it says next page is equal to

two that's a really common way of doing it

then we have all the information etc etc and you can actually see the

character schema here that i showed you with all the information you can get out

etc etc so we're going to do get all characters this is what we're going to

do so i'm going to copy this url here and

we're going to go to our code now to make requests to a server using

python or anything like that we need to use the request module if you

don't have this installed you can do pip install requests you can google that you

can find it that is the main one that you that we're

going to use even the main python documentation says to use requests

i already have this installed so i'm just going to import requests up here at

the top like that and then i'm going to set my base url

to this but i'm going to remove the character part because

this is the base url that we're going to use and then i'm just going to say that

our endpoint is equal to character now

notice that i've left the training slash on our base url so now when we hit this

equals sorry there what i'm going to do with this is i'm going to make a request

to this api with us with our code so i'm going to say r is equal to

requests and we're making a get request so we're going to do

dot gets then we're going to say our base url

plus the endpoint now because these are both strings we can just use the plus

symbol to concatenate them together and then i'm going to print out r now

when i run this we are going to get back a response of

200 which is a good response if you're getting something other than that

400 or 404 is not there 500 is usually excess denied or something like that but

now we actually want the not the actual response code we want the

information that's in that response and the easiest way to do that

is to r dot json and this is going to give us the json response back from that

api so i run that wizards by on the terminal

we can see we get a load of information back

and this is exactly what i showed you in the browser

except now we can actually do stuff with this response and we can take this

information out and we can get the bits that we want so

what i'm going to do is i'm going to say let's um

say data is equal to r dot json and then i'm going to say print

and we're going to say data now we can access the keys using

a square brackets like you're doing a dictionary and i'm going to type

info let's see what we get back now so now i've returned just the

information part of the response now what we want to do

here is we want to grab the number of pages because we want

all of the responses all of the characters sorry from this api so we're

going to go ahead and we want to know how many pages there are

now you could grab the next page link when you go through every time and

just do a request on that you could absolutely do that

but generally i like to make one request to the start to know what we're dealing

with and then do it that way so i'm going to

say we're going to go for let's do pages so again we're going to

reference the key of pages and this is going to return the integer

of 34 there so we can save this into our

variable that pages so now what we want to do

is we want to work out what parts of the other bits of information that we want

so let's go ahead and do print data now if we come back to the

documentation it shows us up here we get

results now this is a list so i'm going to come back and i'm going to say print

data results and i'm going to ask for the first item

in that list so let's run that and we can see that we

get this information back here now within here we have the id and the

name etc etc and then some other keys that we can

access and then we can see we have all of the

episode information there so let's do um let's print out

the uh name so we're going to go we're going to stick within our first

first item on the list because we're just working out what this information

that we want and i'm going to say name so let's print

that okay so we get the name and then let's

save that so let's do name is equal to and i'm going to copy that and i'm going

to say uh episodes

and we'll do if we look for the key up here

we can see we have episode and then it returns a list

not that episode so i'm just accessing different parts of the json data

let's remove that so it's a bit a bit clearer and let's print out

episodes okay and there we go there's our list

back now it's worth mentioning at this point that if you're new to this and

you're still trying to work out how to interrogate the json data properly don't

keep sending requests to the server go ahead and

save the response you can just copy and paste it out into a json file

and save it to your hard drive so you can just work out how you want to get

that information out otherwise you're just sending

unnecessary requests to the server i'm pretty confident doing this so i

mean i've only done like five or six requests at this point so

not a big deal so what we're going to do for this is we're going to say

we want to know how many episodes this character is in

so i'm going to do the length of episodes instead of printing out all of

them i'm going to ask for the length and that's going to give us

how many there are we can see there's 41 so i'm going to say that this character

is in 41 episodes so now we've worked out what bits of

information that we want to get we can go ahead and start to write our

code out properly so we can make the right amount of response

requests to the api get back the information that we

actually want and then can then put it into a nice list so we get

total number of characters and how many episodes they've been in

again this is just demo demo information but you'll get the idea

so the first thing i'm going to do is i'm going to start writing some

functions and i'm going to say up here we're going

to say default defining our new function and we're going to say

main request and i'm going to say we need to give this

a the base url and let's pass in the endpoint as well

this just means that if we wanted to we could use this

to actually go to the other endpoints of the api if we wanted to so now we're

going to put our record r is equal to request.getting here like

that and then we're going to return the r dot

json the json response here so what we want to do now is when we get rid of

this i'm going to move this up to the top so

it's just out of the way really quick and now instead of data is equal to this

we want to do our data is equal to main request

and we are going to actually because we've called this base url and

endpoint anyway we can actually copy that out there

so now if we run this we should be able to get the same results back so let's

just print name just to check that our function is

working which it is okay so now we want to actually work out

how many pages there are that we need to um loop through

so what we're going to do is we're going to say def for our new function we're

going to say get pages and we need to pass in

the json response so we're going to say response and now

we want to work out how many pages there are now as we looked here

and we worked out from the uh info and pages we can copy this out so we can say

pages is equal to we just need to change this ever so slightly because we're

working with the word response and not data within our

function and we can say that and then we can just return

this back out now you can do it like that or you can just make it easier and

just put the whole thing on one line there we go so now we after we do our

main request if we do our get pages on the data and

we actually need to print that out and we'll remove this print statement

from down here and run this we should see 34. there we

go so now we can say that we're making our

request and we can work out how many pages we need

and so what we're going to do now is we're going to write a function to

actually work through the character information on each request

that we give it so let's say df for function

and we'll call this one pass json and we're going to give it the

response again so now we want to put these two in here

this is where we found the information so i'm going to remove them from down

here but what we need to do is we need to

make this into a loop so instead of it like this because here

we're actually indexing the first item in the list

so what we're going to say is for item in response and we want

to access the results key which is we've got here

so we can remove this part we want to say

print item and we need to reference the name and then the episode

but we want item sorry we want length because we wanted to know how long that

list was of episode there i've got too many

things going on there we go so i can get rid of these now for the

moment i'm just going to return this function so i don't get any errors

and i'm missing a bracket so let's put that in so now we can do our past

json function and we can say we need to give it the response which in this case

is data because the response is coming from the main request function

and we have put in a print statement in here temporarily just to check that it

works if it doesn't we can tidy up our errors

before we crack on right there we go so it looks like we've

got our 34 which we are printing out from our get pages

and we have a list of characters and the number of episodes that they've been in

here so we're halfway there so let's get rid

of that so now what we want to do is we want to save this pass json

part into a dictionary so we can actually do something with the data

so instead of printing these two things out i'm going to put them on separate

lines just like this and we're going to say

our character or we'll just call it char for

now and we're going to create a dictionary

that includes these two items i'm just going to indent that in there

and we're going to say the key is name and that is equal to the item name and

then the number of

oh i need there we go is equal to that there and we're missing

our bracket there we have our uh character dictionary here

that's populated by these fields but we need to add them all to a list

so we can actually return the whole list from this function so just at the top

here i'm just going to call this character list and we're going to copy

that and here we're going to do character list dot append

the character that we are adding and then i'm going to return

out of the whole function the character list so now if i

print out what comes out from this function just to see

if it's right hopefully it is there we go so now we have a list

of names and the number of episodes that they've been in

just like we had before but it's in a bit more of a useful format now

now we've got that working we want to work with the pagination which is why we

have this get pages function so let's collapse these ones out of the

way so we know that our main request works here but if we look

at the documentation again we can see that

it has this question mark and then page is equal to

20 at the end so i'm just going to copy that what

that'll do is that'll work tell the request which page to work with

so i'm going to add that in at the top here so i'm going to say that we need to

give this a number which i'm going to make as x

and we're going to do plus and then page is equal to

x like this and we make this an f string so we can actually add that all together

so when we make our main request now we're going to start with a number one

and let's check that that works we get the same information back so now

if we go to number two we can see we've got

some different information which means our pagination is working

so we can just use this main request function so once we do the first one

on page one to get how many pages there are and pass that part of the data what

we can do is we can say for x in range so let's

get rid of this well not get pages let's just get rid of the print part

so we can say 4x in range and we can pass in

our get pages data function there we can print x so if i just get rid of

this for this moment as we can see it goes to

33 and we actually have 34 pages and it starts at zero we just need to

shift that over so i'm going to say for x and range

we're going to start at 1 to get pages data

plus 1. so if we now print x we should get 1 to 34

as opposed to what we had before 0 to 33 there we go 34 1 to 34

so those are going to be our page numbers so what we want to do now is we

want to use our main request function which basically is the one that

returns the json data i'm going to copy that out we're going

to put this in here now i'm going to leave the print statement in so we can

see the pages go by but we're going to do main request of

the base url and the endpoint which is are both up

here which is correct and then x which is for

the x or the page number so we could store this into a variable

but we actually want to pass the information from this so i'm going

to grab our past json function i'm going to bung that in main and we're

going to put that there now what we want to do is we want to

actually store all this information that comes out and

we get a list out of this we can actually have a new main list

probably not the greatest name and we can put that in here but what we can do

is we can do extend so this is basically going to

add everything that comes out of that list into this new list as we go through

each time so now if i do print they not

not inside my for loop it's come out of my for loop

and we do print let's do the length of the main list this is a good way to

find out if you have sort of the right amount of records i

think it was 671 we were looking for so we can see this is the request we're

making here to the server we need that to go up to 34 pages

hopefully 671 so we've got the right amount of

results so i'm happy that now what we can do is

we can export this to a csv file

so let's import in pandas now pandas is basically a really powerful data science

program and to use it for just creating a data

frame and a csv file possibly overkill however it works

really well and i don't know i'm happy with it so

i'm going to import pandas as pd again pip install pandas if

you need to now instead of printing the mate the

length of the main list what i'm going to do is i'm going to say

df for data frame make sure you call your data frame something useful when

you're actually writing your code in a in an actual

project don't just write df because it'll be confusing

it's equal to pd.data frame main list so then i'm going to print def

dot heads and then df dot tail just to double check that the top and

the bottom of our data is different and we haven't

like duplicated it up or something like that

and then we can see we can work on um exporting it so i have actually got

um here 670 it would be 671. this looks like a duplicate but if we go back to

our actual response and this is page 34. we can see that we do have two lots

of the new improved galactic federation guard and to be honest we should

probably stick the id in there as well that would make a lot

more sense so i'm actually going to do that so what we're going to do is up

here in our past json we can actually put the id in so i'm

going to make a new line in our dictionary and say i

id it's equal to item id because the key was the id key we can

see it there so now i've added added that in instead

of printing this out i'm going to do df.2 csv this is why i use pandas

because it's so easy to do this and we're going to call this character

list dot csv and when working with pandas and

this lot that i always tend to do indexes equal to false because generally

i don't want the pandas index which is the zero

index down the side i just want it to look like you would expect it to

so now i'm going to save that and i'm going to run this again and this will be

the last request i make to your api i promise

and we will have back our csv file of the

all the characters that have ever appeared in this show and how many times

they've appeared in each one with the episode so let's

click on that let's close this so we can see

and now we have our id our name and the number of episodes

all the way through down to all these characters who only appeared once

including our new improved galactic federation guards

who are the newest characters if the id number is anything to go by which i'm

sure it is so that's going to do it for this video

guys hopefully you've got a gun understanding now of how you can start

to work with an api in python and you can

make your own queries there's a lot more to it than this we

could actually filter but i'm not going to cover it cover that in this

in this episode maybe i'll cover that in future ones if you've enjoyed what

you've seen please consider dropping a like leave me a comment or

subscribe to the channel i've got lots of stuff like this with

python i've got lots of web scraping content

we're moving into some app building content etc etc so if you've enjoyed

that hit that subscribe button until then thank you very much guys and

i will see you in the next one goodbye