hi everyone and welcome john here and today's video i'm going to show you how
you can create your own image downloader using python so we're
going to be using python requests and beautiful soup and we are going to
be finding all the image tags and then saving
all of the images that it finds to our computer
so let's get started the first thing we want to do is import
requests and from ps4 we're going to import
beautiful soup and i'm also going to import the os
module because that's going to let us create folders and change directories
which we're going to need to do so now i've got those installed the os1
is in the standard python library if you need to pip install requests or
beautiful soup go ahead and do that so this is the website we're going to be
getting the images from everyone knows this website is airbnb um
i've never been to ljubljana before but i'm sure it's really nice so what i'm
going to do is i'm going to try and download the images that it lets
us from these listings what i'm not going to do is i'm not
going to go into each and every individual listing to get all the images
i'm just going to get the top the first one that it gives us so what
we want to do to start with is inspect element so we can start to
see how it looks so if i make that bigger so
we can see if we hover over the first image here
there is an image here image class blah blah blah and all this but
more specifically the most important thing to us is it's actually inside this
image tag now images in html will always be inside
these image tags so we can actually just use
find or with beautiful soup to get them all and start collecting the links that
we want to then download so now i can see that is in there i'm
just going to double check the page source
it's always useful to do and i'm going to just copy
some part of the text so we can get to the
let's just copy and we'll search for under free parking
just so we can see that it's there and it looks like it is
available so we know that we can't we can get to it
so i'm going to copy this url it's quite a long one
i'm just going to put it in here so we're going to say url is equal to this
and just move that up and out of the way the first part is to actually reach out
to the server with requests and then get that information back so as
always i like to do r is equal to requests.get and then we
give it our url which we have specified here see
these two right here the next thing we want to do is we want
to create our soup so we can do soup is equal to
beautiful soup and then we want r dot
we can do text in this case and we'll do html
dot passer so beautiful soup is just the html password in this case
let's move that up one and now i'm just going to check that this is working like
i always do and i'm just going to say print soup dot
title dot text and run that and hopefully if we get
something back that is right which we do we know that this is all going to work
let's clear that off delete that we don't want that what we
do want is we want to find all of the image tags so they're all
like this in the html which means we can simply do images is equal to soup
dot find all because we want it to return a list of ev
every single one that can find on the page and we want to do img
like this what i'm going to do now is i'm just going to print out
images and hopefully we get back a load
of information there we go we do so we can see that we actually got a list
and it's got all of this and we can actually see that the links are here
inside it so we can see there but that's no good we just got the
elements there what we'll do is we'll do a for loop so
we'll do four image in images so each one of those
elements that we just saw inside the all of the images list that we created here
i'm going to print image and then after that i'm going to do src
in the square brackets with the quotation marks because
if i come back here we can see the actual link to the image that i hover
over on the right hand side is under this src the source equals
and we can access the information that's just in this little tag here which is
where the image url is so to do that let's do that and then
let's run that and hopefully scroll down and we've got a nice long
list of image links that we could if i just
click on one that didn't work if i go to chrome copy and paste it in we can see
that is the image returned that's not quite the images that i was
hoping for from this but you know it's there and it works so the next
thing you want to do is to save the image but first what i'm
going to check out is i'm going to try and give it a better name than just the
file name so i'm going to go back over to our
source code and i'm going to have a look and quite often you get these alt tags
here which basically is the sort of the name
for the image so we can actually access that the same way that we did the
source tag this one we can use this in the alt tag almost all websites will
have an alt tag for their images it's quite
important for seo so they will be there we can access
let's close that down so then let's get rid of our print statement here
and say uh let's call this one link because that was the image link
above that i'm going to put name and i'm just going to say image and then the alt
alt tag like that so now if i print name and link
we should get that information out as well okay we can see it's all here so
the first one this is obviously something else at the
top of the page it doesn't have an alt tag and it seems
to be just a gif file we're just going to ignore that for now
um and that will be fine but the rest of them are all there
and working to save the images we can do with open so we're going to be
opening a file writing to it and then saving it
and we need to give it a file name this is why we've gone ahead and got the
name from the image here so we can call that our file this
name we need to give it an extension so i'm just going to do plus
and then i'm going to give it a jpeg for an image extension
it doesn't matter if the original file isn't a jpeg file or if it's
jpeg go ahead and try and save it as a jpeg first
um that's usually your best option most web files are jpegs anyway so
that's a good start and then we want to do wb because we want to write to it but
we want the bytes we want to know the actual
raw bits of the information that are in there so that's why we need wb
and then as f and our codon and then under here we want
to actually send out a request to the individual links that we can then get
the information from them from the server so we're going to want
to do another request so i've got r is equal to request dot
get up here so i'm actually just going to do i
m for image and then we're going to do requests.get
and then we're going to say link and then we want to do
f dot write the i m that is our response for the link for the image
and we want the dot contents the content is going to be the bytes
content so we can be able to save that using our write with our bytes
file and then save that to the disk so i'm
going to run this now and we'll see that it's going to go out and download all
those images and it's going to save them into the current directory that we're
working in we've got no output so i've got an error
here and that's because what i've tried to do is i've tried to write a name that
is not an acceptable file name so the best thing to do is i'm just going to go
ahead and hit replace and i'm going to replace all of the
blank spaces with a dash now hopefully what that will
do is it'll fill in all the blanks that are
actually causing us issues and saving that
with their new file name so that's looking like
it's failed right so that didn't work so let's go ahead and replace
the i think it's probably the slash forward slashes and replace them with
nothing let's try that okay there we go so we can see when i
actually read this error the first time i didn't take into account that
there was the forward slashes that were causing the problem i was just looking
at the extra dots so after we replaced that it worked fine
so if i go ahead and open the folder we can see we've
actually got all these images here so if i just
open the reveal explorer we can see that we've got them all and
they're all saved all of the thumb all the images there
for all of those and they've all got their
appropriate names as we save them the duplicate ones are where we run it the
first time i'm showing that bigger for you guys
so there we go so that's worked that's great
so there's a few things we can do to improve this although this is uh
the basics sort of frame of what it is that will work
but what i'd like to do is i'd like to turn this into a function that we can
then use for different websites add a little bit of error handling in as
well and also create a new folder that we can say
say hey save all of the images from this um
this page into this folder okay so i'm going to
actually just collapse some of this down now and i'm going to create our function
so def defining our function and i'm just going
to call this one image down and then inside this function we're
going to give it two two things so we're going to have url
and we're going to have folder so when i say folder i'm going to create a new
folder with the name that we give it so we need to indent this now to
create a folder on python it's really simple we use the
os module that we've imported and we would just do
os dot m k d i r make directory but we need to kind of
do a little bit more than that first so we need to find out
we need to get the current working directory first and then we need to
create one inside that because if we just did this it probably wouldn't be in
the right place so we want it to be in this folder but a
new directory so what i'm going to do is i'm going to
say we're going to do make a directory but what we want to do
is we want to join the current working directory
and the folder name that we give it so i'm going to say
os.path dot join and there we're going to join the two
together so when we do os.path.join it will automatically put in the forward
slashes in the correct places for us and we're going to join the two of os
dot i think it is get current working
directory and folder so that looks a little bit
sort of long and maybe quite a little bit convoluted but all we're doing is
the main part is we're creating a directory and what we're doing is we're
creating the directory that is joining together
the current directory we're in and the new folder name we give it
okay so it's it's it's just all on one line but it should be quite
straightforward what i'm going to do is i'm going to do
try first um and then i'm just going to do a real um
basic error handling you shouldn't really do except pass but for this case
i think it's fine because we we know what this is doing um so i'm going to
try creating the directory and if it fails
instead of kicking us out our program is just going to move on
okay so then we can do our r is equal to request dot get and we can find all the
image tags and then we can get the alt and the
source for each one and then we can write them all to the
file but what we haven't done is we haven't actually
um changed into our directory so i'm going to do that underneath
that i'm going to do os dot ch there for terrain
change directory what i'm going to do is i'm just going to paste this back in
because this is now created this directory the
join so i'm going to go ahead and put that right in there
because that's just going to go ahead and change into that directory
that we created now we've done that i'm just going to
add in a quick print statement down at the bottom
so i'm just going to say just so we can see it working
not like that print and i'm going to say writing and then we'll give it
name okay so what we've done is we've turned our
little basic script just into a function that we can reuse
we're going to give it a url and then a folder name so i'm going to comment this
url out here i'm going to let's find another
place let's go to where else do you want to go
let's go bratislava why not and select some random dates that we
might be looking at going cool great so we've got a new link let's copy
that and underneath here we're going to do image
down for our function and if you remember we have to give it the url
and this is hidden by me there we go and then we're going to give it the folder
name of which i'm going to just call it
bratislava why not i'm going to save that let's
move back over here and then going to run that and we'll get
writing see we still get that blank one at the top but i think that's okay we
we kind of understand what that is we could write that we could write some
code out for that if we wanted to but i don't think we need to
and let's go to our file browser and we can see we've got a new folder here
created and all the images in and if i reveal the explorer
we should have all those images right there so that was nice and easy
um i'll put this code in my github uh you guys can go ahead and take it and
maybe change it a little bit make it work for
you um but it's pretty simple uh the only sort of complicated bits
that you may or may not have seen is the os module and changing directories and
creating new folders just keeps it all tidy and you have to
do a little bit of replace on the string of the name if
you're using the alt tag you don't have to use the old tag you
can call it whatever you like you could just call
you could do a loop and you could say the first image you find is called
image one and then all the way down just keep adding
onto it if you like i just thought it was it was a nicer way to have the
actual alt name of the image in there um just makes it a bit better to
sort of know where you're at and know what it is
that you've actually got the image for but you could call it whatever you like
so that'll do it for this one guys thank you very much for watching don't forget
to like comment abs and subscribe and i will see you in the
next one thank you bye