The Cloud / Big Data
Datacenter Economics:
- Google, Microsoft, UMich put up datacenters
- small company? rent space at a commercial datacenter
renting space? pay for space, power, bandwidth, maybe labor
moore's law : your servers are going down in price, but power & hardware costs are high
Datacenters may provide:
- physical security
- good environment (cooling)
- network access
- power resources
- weather hardening
Sharing Computers & Network
- OS maintains illusion that each process/application is alone in my machine
- Virtual Private Networks: using secure connections on a public network (packet-switched, illusion to users that it's my own network)
THE CLOUD
- the next level of sharing
- rent what you need when you need it
- can be a lot cheaper
- service level agreement (contract to specify metrics of quality "what is the service supposed to do, what's the peak load it has to satisfy, what's the uptime going to be")
what do you rent?
- Infrastructure as a service (IaaS) = machines, CPU, cycles, storage
- Platform as a Service (PaaS) = database servers, web servers
- Software as a Service (SaaS) = email, back-up, other apps
do you trust your service provider?
- what if they are malicious? incompetent? if they go out of business?
A Service: (like an object)
- logical representation of a repeatable business activity with a specific outcome
- is self-contained, maybe composed of other services
- "black-box" to consumers
Service vs API/Object Interface
- service is a different process, usually a different machine, managed by another party
- what is the quality of the service
to use the cloud, we'll typically use the cloud as a service. just because we're using a service, doesn't mean we're using service oriented architecture. i could take something i'm working on, and make it a service.
REST as database query lang:
- organize database as tree (XML)
- path portion of URL identifies element
- can use query strings to add selection conditions (filter)
- data returned XML or JSON
REST commands: HTTP protocal
- post (insert items)
- get
- put (update items)
- delete
BIG DATAAAAA 4 v's of big data
- volume, velocity, variety, veracity (errors in data)
- involves structured & unstructured data (ex: plain text)
- implemented on the cloud
- running machine-learning algorithms using MapReduce on the cloud. (one use case)
BigTable (google's implementation of relational databases) GFS + a datastructure
- traditional database can't scale.
- uses a new query language to update data (simple SQL queries, sparse table format (empty cells assumed) w/timestamps). no locking
- take columns, group cols together into families
- pretend that each column family is a column for storage purposes
- column oriented compression. created a chunk of entries, that chunk goes into the file system
- use mapreduce to process things, and find tablets (manipulation needed to find which tablet to access)
- you wish to find this value. you use an index, then find the tablet where the data is contained