Google App Engine has added support for serving images directly out of its Picasa infrastructural. Because "image serving service" is unwieldy to say, I'm going to use PIS for shorthand (Picasa Image Serving)
Benefits
This image service has lots of great benefits:- Streamed reads - animated gifs will start playing before whole image is downloaded
- Resize & crop images via url parameters - No CPU or extra storage costs for thumbnails!
- Served from edge servers
- Cost effective way to host/serve large images.
- Supports https
- Spread across multiple domains
You can download a working example source.
Overview of how it works
- Create a new Google AppEngine instance with a unique name.
- Edit demo project and upload to Google AppEngine
- Bulk upload all your images.
- Download all the PIS urls and use them in your site.
Set up Google App Engine
You'll need to install enough Google AppEngine to use the code deploy tools. There are lots of how-to documents out there. This app uses Python, so you'll need to install Python 2.6.x. There are newer releases of Python, but GAE isn't compatible with them yet.
When you go to create your Google AppEngine instance, you'll need to change some default setting before you click Create Application.
You want to turn off the High Replication feature. Copying data out of the Google AppEngine instance isn't supported for that flavor DB. Also, High Replication is more expensive and since you should be embedding PIS image urls somewhere else, it really just overkill.
Now edit the app.yaml file and change the application name from pic-demo to your new application name.
file:app.yaml
application: pis-demo version: 3 runtime: python
You can make other changes to the application in this file. By default, uploading images is restricted to admins. If you want anybody to be able to upload and view images, you'll need to change permissions:
builtins: - deferred: on - remote_api: on - datastore_admin: on ... - url: /uploadmgr static_files: uploadmgr.html upload: uploadmgr.html login: admin - url: /getuploadurl script: main.py login: admin - url: /upload script: main.py login: admin - url: /del script: main.py login: admin - url: /deleteall script: main.py login: admin
Upload the application
Upload the whole project (instructions for this step can be found in numerous how-to's).
Upload images
The upload is pretty fast if you have a good internet connection. You should do them in batches.A feature I'd like to add soon is batch naming to make deleting sets of images easier.
View images
Note: the pagination uses He3's PagedQuery. It is fairly efficient; however, it does depend on memcache (which could be purged quickly if your app isn't heavily used). This means the viewing operation can be too expensive for broad, public consumption. Right now the viewer page is NOT restricted to admins like upload. The delete commands found here are restricted, though.There are two ways to reference the uploaded images. An appspot url and a PIS url which typically starts with lh3.ggpht.com where the lh# changes.
//lh3.ggpht.com/6EP5Tzu8Ov2Ke_0ViE3t0hnbUCKozXp65JNvw8LVvemN32AbXU7r9pBL9RJiXGN5s3Zbt3SEP3W53HTV
You'll notice the urls don't start with http: or https:, this is intentional. You can use //domain/ style urls so that if images are embedded in secure pages (like https), they urls will use the same scheme without modification.
You can use the //pis-demo.appspot.com/i/7011/fate-of-sid.png style urls. This approach will lookup the ggpht.com url in the datastore and do a redirect; however, this generates CPU and database overhead costs. Also, there is load speed advantages to spreading you images across multiple domains like lh1.ggpht.com, lh2.ggpht.com, etc. However, if you needed to restrict access or track usage, then the redirector is the way to go.
Download image urls
This can be done multiple ways, but the idea is that you are uploading a lot of images, so generating a long table on the server isn't a good idea.Here are three ways to download the ImageUrlMap data.
I prefer Option 2 for developers since it creates a local backup of data and keeps your dev environment up to date. Option 3 is good if you're just looking to host images.
Keep in mind that blobstore (actual images) are NOT downloaded. What you get is a list of urls serving images that you can reference in other code.
Options 1:
Copy tables between Google App Engine instances directly.
Good if you have a ton of data.
- Edit appengine_config.py and add back in the lines for remoteapi_CUSTOM_ENVIRONMENT_AUTHENTICATION.
- Add in your GAE domain name <yourapp>.appspot.com
- NOTE: you need to grant permissions to the app *getting* the data
- Deploy app to GAE (copy up files)
- Use the Datastore Admin button "Copy to Another App"
Option 2:
Download SQL copy of the database, then upload to local running development server or different GAE instance.
Makes a local backup of data and keeps your GAE dev system primed.
- Verify app.ymal has:
remote_api: on - Download the data to a local sql file:
appcfg.py download_data --application=<yourapp> --kind=ImageUrlMap --url=http://<yourapp>.appspot.com/_ah/remote_api --filename=imageurlmap.sql
- Make sure Google App Engine launcher is running you app locally. NOTE url's port might be different!
- Upload to dev server
appcfg.py upload_data --application=dev~<yourapp> --kind=ImageUrlMap --url=http://localhost:8080/_ah/remote_api --filename=imageurlmap.sql
- Verify target instance's app.ymal has:
- remote_api: on - Upload to a different appengine instance NOTE the difference with dev~!
appcfg.py upload_data --application=<yourapp> --kind=ImageUrlMap --url=http://<yourapp>.appspot.com/_ah/remote_api --filename=imageurlmap.sql
Option 3:
Download all your urls as a local cvs file.
- You can use the bulkloader.yaml adaptor file in tools and execute this command.
appcfg.py download_data --config_file=bulkloader.yaml --kind=ImageUrlMap --url=http://<yourapp>.appspot.com/_ah/remote_api --filename=imageurlmap.csv
- You can import this csv file into whatever format you like.
Code details
While I've been coding for over 20 years, I haven't had a ton of experience with Python. I'm sure some non-pythonisms have made their way into the source, so if you have any suggestions on how I can fix things, email me.The logic to upload images is confusing. Here's a simplified version:

# this is for use with blueimp jquery uploader class UploadUrlJsonHandler(webapp.RequestHandler): def get(self): batchidstr = self.request.get('batchid')[:20] if not batchidstr: batchidstr = '0' upload_url = blobstore.create_upload_url("/upload/%s/" % batchidstr, max_bytes_total=MAX_FILE_UPLOAD_SIZE) self.response.headers['Content-Type'] = 'application/json' self.response.out.write('"%s"' % upload_url)
This is initially confusing until you realize blob uploading is a difference service. Basically, you return a url that does all the complexity of uploading data and you give the call that generates that a "redirect" url to call once the upload is complete. That completion url is yours, but it's stored internally by the service.
batchidstr
is a number generated by the server that groups all uploads that happen from the same page. This feature might be handy because large numbers of images can get unwieldy. The batchid is read from the url and embedded in the upload complete url.
It's interesting to not that the url returned by create_upload_url()
does not contain the url you used to build it. I suspect the server stores the url and is returning a type of cookie.
class GetBatchId(webapp.RequestHandler): def get(self): # Used by upload page to tag batches of uploads. We want them # generally sequential; however, there can be gaps in the batchids. # We use Google DB to grab and consume a row id as a marker. (start_id, end_id) = db.allocate_ids( db.Key.from_path('ImageUrlMap', 1), 1) self.response.out.write('{"batchid":%d}' % start_id)
class UploadFileHandler(blobstore_handlers.BlobstoreUploadHandler): def post(self, batchidstr): """ Before starting an upload. The client requests a streaming upload url to use. """ import urlparse from google.appengine.api import images from model import ImageUrlMap blob_info = self.get_uploads('files[]')[0] # this call can take several seconds imgserving_url = images.get_serving_url(blob_info.key()) # remove the scheme part from the url so http/https is transparent # e.g. http://domain/path vs domain/path # (which just uses the refering page's scheme) if imgserving_url.lower()[0:5] == 'http://': imgserving_url = imgserving_url[7:] elif imgserving_url.lower()[0:6] == 'https://': # shouldn't happen imgserving_url = imgserving_url[8:] imgmap = ImageUrlMap(blobstoreinfo=blob_info.key(), imgserving_url=imgserving_url, filename=blob_info.filename, filesize=blob_info.size, batchid=int(batchidstr)) imgmap.put() baseuri = urlparse.urlparse(self.request.url) deleteuri = ("//%s/del/%d/%s" % (baseuri.netloc, imgmap.key().id(), urllib.quote(imgmap.filename))) # thumbnail dimensions are controlled by url param thumbnailuri = imgserving_url+THUMBNAIL_MAX_DIM_PARAM # We cannot return an actual document here. We *MUST* do a redirect. # We want to pass data from this function to the redirect url. # We could just pass the key from imgmap and re-read, but better # to just encode everything here. Risk is that the url will be > 2048 # and not work on IE. # This is the json data blueimp expects json_response = ( ('[{"name":"%s","id":%d,"size":%d,"url":"%s",'+ '"batchid":%s,'+ '"thumbnail_url":"%s","delete_url":"%s", "delete_type":"POST"}]') % (blob_info.filename, imgmap.key().id(),blob_info.size, imgserving_url, batchidstr, thumbnailuri, deleteuri)) urlparam = json_response.encode('base64').encode('hex') self.redirect('/blueimp/%s' % urlparam) # blobstore's upload logic requires a redirect class BlueImpUploadDoneHandler(webapp.RequestHandler): def get(self, encodedparam): json_response = encodedparam.decode('hex').decode('base64') self.response.headers['Content-Type'] = 'application/json' self.response.out.write(json_response)
THUMBNAIL_MAX_DIM_PARAM = "=s100"
class GetDataDump(webapp.RequestHandler): # stupid web tricks. We want to generate a different filename for the csv # file, so we create it here and redirect to /csvfile/filename to do the # actual download. def get(self): batchidfilter = self.request.get('batchid', default_value='')[:16] filenamefilter = self.request.get('filename', default_value='')[:255] filename = 'export_' urlparams = '' if batchidfilter: filename += '_batch_' + batchidfilter if filenamefilter: filename += '_filename_' + filenamefilter url = ('/csvfile/%s.csv?%s' % (filename, self.request.query_string)) self.redirect(url) class SaveTabCSVData(webapp.RequestHandler): def get(self, csv_filename): import csv import cStringIO import math # You really should use the data export feature # See tools/downloading_data.txt batchidfilter = self.request.get('batchid', default_value='')[:16] filenamefilter = self.request.get('filename', default_value='')[:255] q = ImageUrlMap.all() if batchidfilter: q.filter("batchid =",int(batchidfilter)) if filenamefilter: q.filter("filename =",filenamefilter) output = cStringIO.StringIO() csv_writer = csv.writer(output, ['datecreated','id','filename','filesize', 'batchid','imgserving_url']) numloops = math.ceil(MAX_ROWS_TO_DOWNLOAD/1000.0) for loops in xrange(0,numloops): # we cap it at 2,000 rows (1000 fetches - 2 times) # You can increase it. # Can cost too much cpu, use download tools! r = q.fetch(limit=1000) if not r or len(r) == 0: break for row in r: csv_writer.writerow( [row.datecreated,row.key().id(),row.filename,row.filesize, row.batchid,row.imgserving_url]) # more in result. q.with_cursor(start_cursor=q.cursor()) self.response.headers['Content-Type'] = 'text/csv' self.response.out.write(output.getvalue())
class DeepDeleteHandler(webapp.RequestHandler): def post(self, mapid, filename): imgmap = ImageUrlMap.GetbyIdAndFilename(mapid, urllib.unquote(filename)) if not imgmap: logging.info("Image Delete failed because filename mismatch ('%s' vs '%s')" % (filename, imgmap.filename)) self.error(404) return imgmap.DeepDelete() # kinda wrong that one has to load to delete self.response.headers['Content-Type'] = 'application/json' self.response.out.write(mapid)
# if you call blobstoreinfo.key(), you'll make a 2nd call to the database to load the blobstore record. # Use this call to get the key without loading the data def GetBlobstoreKey(self): return ImageUrlMap.blobstoreinfo.get_value_for_datastore(self)
No comments:
Post a Comment