Wednesday, 7 September 2016

Python OrderedDict

An OrderedDict is a dictionary subclass that remembers the order in which its contents are added

import collections

print 'Regular dictionary:'
d = {}
d['a'] = 'A'
d['b'] = 'B'
d['c'] = 'C'
d['d'] = 'D'
d['e'] = 'E'

for k, v in d.items():
print k, v

print '\nOrderedDict:'
d = collections.OrderedDict()
d['a'] = 'A'
d['b'] = 'B'
d['c'] = 'C'
d['d'] = 'D'
d['e'] = 'E'

for k, v in d.items():
print k, v

A regular dict does not track the insertion order, and iterating over it produces the values in an arbitrary order. In an OrderedDict, by contrast, the order the items are inserted is remembered and used when creating an iterator.

A regular dict looks at its contents when testing for equality. An OrderedDict also considers the order the items were added.

Friday, 10 June 2016

Touch Develop

Every 11 year-old in Britain will be getting the BBC Micro Bit this autumn. The platform will be programmable with three programming languages: Touch Develop (https://www.touchdevelop.com/), C++, and Python.

The Python Software Foundation is a supporting organization of this initiative. The PSF blog post on the topic can be found athttp://pyfound.blogspot.com/2015/03/bbc-launches-microbit.html

https://www.touchdevelop.com/

Docs

Blogs

Launch Editor

Wednesday, 8 June 2016

Google's TensorFlow

TensorFlow is an Open Source Software Library for Machine Intelligence

About TensorFlow

TensorFlow™ is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API. TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.

What is a Data Flow Graph?

Data flow graphs describe mathematical computation with a directed graph of nodes & edges. Nodes typically implement mathematical operations, but can also represent endpoints to feed in data, push out results, or read/write persistent variables. Edges describe the input/output relationships between nodes. These data edges carry dynamically-sized multidimensional data arrays, or tensors. The flow of tensors through the graph is where TensorFlow gets its name. Nodes are assigned to computational devices and execute asynchronously and in parallel once all the tensors on their incoming edges becomes available.

Features

Deep Flexibility

TensorFlow isn't a rigid neural networks library. If you can express your computation as a data flow graph, you can use TensorFlow. You construct the graph, and you write the inner loop that drives computation. We provide helpful tools to assemble subgraphs common in neural networks, but users can write their own higher-level libraries on top of TensorFlow. Defining handy new compositions of operators is as easy as writing a Python function and costs you nothing in performance. And if you don't see the low-level data operator you need, write a bit of C++ to add a new one.

True Portability

TensorFlow runs on CPUs or GPUs, and on desktop, server, or mobile computing platforms. Want to play around with a machine learning idea on your laptop without need of any special hardware? TensorFlow has you covered. Ready to scale-up and train that model faster on GPUs with no code changes? TensorFlow has you covered. Want to deploy that trained model on mobile as part of your product? TensorFlow has you covered. Changed your mind and want to run the model as a service in the cloud? Containerize with Docker and TensorFlow just works.

Connect Research and Production

Gone are the days when moving a machine learning idea from research to product require a major rewrite. At Google, research scientists experiment with new algorithms in TensorFlow, and product teams use TensorFlow to train and serve models live to real customers. Using TensorFlow allows industrial researchers to push ideas to products faster, and allows academic researchers to share code more directly and with greater scientific reproducibility.

Connect Research and Production

Language Options

TensorFlow comes with an easy to use Python interface and a no-nonsense C++ interface to build and execute your computational graphs. Write stand-alone TensorFlow Python or C++ programs, or try things out in an interactive TensorFlow iPython notebook where you can keep notes, code, and visualizations logically grouped. This is just the start though -- we’re hoping to entice you to contribute SWIG interfaces to your favorite language -- be it Go, Java, Lua, JavaScript, or R.

Maximize Performance

Want to use every ounce of muscle in that workstation with 32 CPU cores and 4 GPU cards? With first-class support for threads, queues, and asynchronous computation, TensorFlow allows you to make the most of your available hardware. Freely assign compute elements of your TensorFlow graph to different devices, and let TensorFlow handle the copies.

Thanks.....

Sources: Officila Media, Google View TensorFlow,

Thursday, 28 April 2016

Python or R : Which one is best ?

Python is a versatile programming language that can do everything from data mining to plotting graphs. Its design philosophy is based on the importance of readability and simplicity.

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.

As you can imagine, algorithms in Python are designed to be easy to read and write. Blocks of Python code are separated by indentations. Within each block, you’ll discover a syntax that wouldn’t be out of place in a technical handbook.

Benefits of R

R is a programming environment specifically designed for data analysis that is very popular in the data science community. You’ll need to understand R if you want to make it far in your data science career. This interactive tutorial will help.

PYTHON VS R

Usage

Python, as we noted above, is often used by computer programmers since it is the Swiss knife of programming languages, versatile enough so that you can build websites and do data analysis at the same time.

R is primarily used by researchers and academics who don’t necessarily have a background or knowledge of computer science.

Syntax

Python has a nice clear “English-like” syntax that makes debugging and understanding code easier, while R has unconventional syntax that can be tricky to understand, especially if you have learned another programming language.

Learning curve

R is slightly harder to pick up, especially since it doesn’t follow the normal conventions other common programming languages have. Python is simple enough that it makes for a really good first programming language to learn.

Popularity

Python has always been among the top 5 most popular programming languages on Github, a common repository of code that often tracks usage habits across all programmers quite accurately, while R typically hovers below the top 10.

Python is versatile, simple, easier to learn, and powerful because of its usefulness in a variety of contexts, some of which have nothing to do with data science. R is a specialized environment that looks to optimize for data analysis, but which is harder to learn. You’ll get paid more if you stick it out with R rather than working with Python.

Wednesday, 20 April 2016

Crawl Your Ecommerce Site with Python

Ecommerce business owners and managers have many good reasons to crawl their own websites, including monitoring pages, tracking site performance, ensuring the site is accessible to customers with disabilities, and looking for optimization opportunities.

For each of these, there are discrete tools, web crawlers, and services you could purchase to help monitor your site. While these solutions can be effective, with a relatively small about of development work you can create your own site crawler and site monitoring system.

The first step toward building your own, custom site-crawling and monitoring application is to simply get a list of all of the pages on your site. In this article, I’ll review how to use the Python programming language and a tidy web crawling framework called Scrapy to easily generate a list of those pages.

You’ll Need a Server, Python, and Scrapy

This is a development project. While it is relatively easy to complete, you will still need a server with Python and Scrapy installed. You will also want command line access to that server via a terminal application or an SSH client.

In a July 2015 article, “Monitor Competitor Prices with Python and Scrapy,” I described in some detail how to install Python and Scrapy on a Linux server or OS X machine. You can also get information about installing Python from the documentation section of Python.org. Scrapy also has good installation documentation.

Given all of these available resources, I’ll start with the assumption that you have your server ready to go with both Python and Scrapy installed.

Create a Scrapy Project

Using an SSH client like Putty for Windows or the terminal application on a Mac or Linux computer, navigate to the directory where you want to keep your Scrapy projects. Using a built-in Scrapy command, startproject, we can quickly generate the basic files we need.

For this article, I am going to be crawling a website called Business Idea Daily, so I am naming the project “bid.”

scrapy startproject bid

Scrapy will generate several files and directories.

Generate a New Scrapy Web Spider

For your convenience, Scrapy has another command line tool that will generate a new web spider automatically.

scrapy genspider -t crawl getbid businessideadaily.com

Let’s look at this command piece by piece.

The first term, scrapy, references the Scrapy framework. Next, we have the genspidercommand that tells Scrapy we want a new web spider or, if you prefer, a new web crawler.

The -t tells Scrapy that we want to choose a specific template. The genspider command can generate any one of four generic web spider templates: basic, crawl, csvfeed, andxmlfeed. Directly after the -t, we specify the template we want, and, in this example, we will be creating what Scrapy calls a CrawlSpider.

The term, getbid, is simply the name of the spider; this could have been any reasonable name.

The final portion of the command tells Scrapy what website we want to crawl. The framework will use this to populate a couple of the new spider’s parameters.

Define Items

In Scrapy, Items are mini models or ways of organizing the things our spider collects when it crawls a specific website. While we could easily complete our aim — getting a list of all of the pages on a specific website — without using Items, not using Items might limit us if we wanted to expand our crawler later.

To define an Item, simply open the items.py file Scrapy created when we generated the project. In it, there will be a class called BidItem. The class name is based on the name we gave our project.

class BidItem(scrapy.Item):
 # define the fields for your item here like:
 # name = scrapy.Field()
 pass

Replace pass with a definition for a new field called url.

url = scrapy.Field()

Save the file and you’re done.

Build the Web Spider

Next open the spider’s directory in your project and look for the new spider Scrapy generated. In the example, this spider is called getbid, so the file is getbid.py.

When you open this file in an editor, you should see something like the following.

# -*- coding: utf-8 -*-
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule

from bid.items import BidItem

class GetbidSpider(CrawlSpider):
 name = 'getbid'
 allowed_domains = ['businessideadaily.com']
 start_urls = ['http://www.businessideadaily.com/']

rules = (
 Rule(LinkExtractor(allow=r'Items/'), callback='parse_item', follow=True),
 )

def parse_item(self, response):
 i = BidItem()
 #i['domain_id'] = response.xpath('//input[@id="sid"]/@value').extract()
 #i['name'] = response.xpath('//div[@id="name"]').extract()
 #i['description'] = response.xpath('//div[@id="description"]').extract()
 return i

We need to make a few minor changes to the code Scrapy generated for us. First, we need to modify the arguments for the LinkExtractor under rules. We are simply going to delete everything in the parenthesis.

Rule(LinkExtractor(), callback='parse_item', follow=True),

With this update, our spider will find every link on the start page (home page), pass the individual link to the parse_item method, and follow links to the next page of the site to ensure we are getting every linked page.

Next, we need to update the parse_item method. We will remove all of the commented lines. These lines were just examples that Scrapy included for us.

def parse_item(self, response):
 i = BidItem()
 return i

I like to use variable names that have meaning. So I am going to change the i to href, which is the name of the attribute in an HTML link that holds, if you will, the target link’s address.

def parse_item(self, response):
 href = BidItem()
 return href

Now for the magic. We will capture the page URL as an Item.

def parse_item(self, response):
 href = BidItem()
 href['url'] = response.url
 return href

That is it. The new spider is ready to crawl.

Blogs

Wednesday, 7 September 2016

Python OrderedDict

Friday, 10 June 2016

Touch Develop

Wednesday, 8 June 2016

Google's TensorFlow

TensorFlow is an Open Source Software Library for Machine Intelligence

About TensorFlow

What is a Data Flow Graph?

Deep Flexibility

True Portability

Connect Research and Production

Connect Research and Production

Language Options

Maximize Performance

Thursday, 28 April 2016

Python or R : Which one is best ?

Usage

Syntax

Learning curve

Popularity

Wednesday, 20 April 2016

Crawl Your Ecommerce Site with Python

You’ll Need a Server, Python, and Scrapy

Create a Scrapy Project

Generate a New Scrapy Web Spider

Define Items

Build the Web Spider

Google’s TGIF (Thank God It’s Friday)

Search This Blog