Posts

Analyzing Software Development using data science and analytic tools

Yesterday I had the pleasure to host Markus Harrer  who gave a talk about analyzing software code using data science, which he describe in details in his blog . There is often a communication gap between software developers and management. While good developers can see the big picture of the code, and timely identify a need to restructure or even rewrite the code, they often miss to see the risks of both, time and cost, of such an operation. The management on the other hand, while being able to identify risks of getting into an adventure of rewriting a legacy system, often fail to understand the outcome and the risks of lack of maintenance. The solution to overcome this 'gap of ignorance' is communicating using data science and analytics. Using Jupyter  notebooks, for example, to combine both textual explanations as well as analytics and diagrams can easily communicate the risks of a software to the management. For using data science on software code, one should decide

Natural Language Processing :: A beginner’s (brief) guide for working on NLP problems

This article  was written for the Data Science Society's Datathon held on February 2018, as an answer to questions from the participants who were new to NLP. The Data Science Society's  Datathon 2018 , presented us this time many cases which are Natural Language Processing related . One of the cases, for example, involves extracting entities' activities from unstructured documents, and determining their sentiment. So, how should one begin working on such a problem? First, let’s break this problem down: (1) we need to to detect which entities are mentioned in an article, and then (2) we need to detect the sentiment that is related, specifically to those entities. Let’s think about it logically for a moment: Entities are (normally) nouns, and in order to get the sentiment we will probably need to look at the adjectives that describe them, or the verbs that are related to them. So a first step could be (a) parsing the document into sentences, and then int

NLP Toolkit (Made for self use - but feel free to use it too :) )

A list of tools for NLP tasks. Done mostly for self-reference... hence quite brief. Better lists can be found out there: https://github.com/keon/awesome-nlp SciPy ( sklearn.feature_extraction.text ) CountVectorizer - converting text to token-counts matrix (n-grams co-reference) which is then used by: TfidfTransformer - transforms a count-matrix ( CountVectorizer output) to term-frequency or inverse document frequency (TF-IDF) NLTK Corpus reader - Words tokenization POS Chunking Stemming Creating parse trees out of sentences Using KnowledgeBases (FrameNet, WordNet, propBank) Different implementation for tagging (Senna, Stanford) Spacy Corpus reader - Tokenization NER, POS Semantic representation (Word Vectors) Labeled dependency parsing Gensim : Corpus reader / parser transofrmations (TF-IDF, LSA, LDA, HDP) Similarity Queries Topic segmentation (LDA, LSA) AllenNLP High-level trained models Machine comprehension (QA based on given t

pip install pymssql fails with 'sqlfront.h': No such file or directory

I've tried to install pymssql on Windows using command line: pip install pymssql The operation fails with an error: fatal error C1083: Cannot open include file: 'sqlfront.h': No such file or directory while looking for a solution, the first results  are not really helpful, until you arrive to this helpful one . In short: I was using python version 3.6, which currently isn't supported with pymssql. Using python 3.5 instead solves that issue, and the  pip install pymssql runs well. If you're using Anaconda Navigator , it is made even simpler: In the Environment section, you can add a new environment with your choice of python version. When you're choosing a supported version - 3.5 for example - you will find the pymssql in the packages list and install it from there directly.

AngularJS Directive: Accessing an DOM element with a dynamic ID in an asynchronous directive

Hi, This problem was really bothering me for few months. In earlier cases I tried to avoid it, but today I could not hide from it any longer. And the thing is that there was no post on any other place that could solve this matter... So here it goes. You're writing a directive in AngularJS , and you want to access one of the DOM elements declared within its template. Usually you do it through the linking function (post section in the compile) or in the directive controller, by injecting and using '$element', as explained here . But, in my case, I had to access an element using its ID and this ID was dynamic. So, my template contains:     < div >     < div ng-attr-id ="{{ 'something_' + dynamicValue }}"></ div > </ div >       And my directive contains: {     templateURL: 'pathToTempalte' ,     scope: {         dynamicValue: '@'     },     link: function (scope, element, attribut

A guide for modeling a graph database - A lunch with Neo4J chief scientist Jim Webber, London

Since the invention of NOSQL databases, it gets more and more attention from the developer community. One of the remarkable and unique databases exists in this topic of NOSQL is the graph database of NEO4J, an open-source database which stores the data as a Graph. Modeling a graph database is quite different than modeling the regular RDBMS database, and even from other NOSQL databases such as key-value collections. We got used to identifying the molecularity of the data and save it as the columns, joining similar data together into tables. But since Neo4J is a flexible graph database, this case does not work there. In order to model the data, we need to identify first the queries that we're about to run on the database. These queries will form the basic logical sentences which are the keys of modeling the database (more about that coming up in a second). We need to identify where to store each piece of information: - as a node - as an edge - as a property of a node or of

Javascript, animation and easing functions

I recently had to create an animation, which obey to the laws of physics, using javascript and HTML5. Among the gravity and free falling, one of these animations had to also apply sort of bouncing and elastic laws. After creating it first with CSS3 and HTML, we realized that most of the Android devices does not support it fully yet. Especially versions prior to Ice Cream Sandwich, such as Android 2.1, 2.2, 2.3 and lower (Froyo and its siblings). So, I had to transfer everything into HTML5 Canvas, and instead of applying the css-transitions, I had to implements everything alone. The first website I encountered and would like to highly recommend is Timothee Groleau's easing generator which can be found here : http://timotheegroleau.com/Flash/experiments/easing_function_generator.htm Originally made for Flash - but can easily adopted to Javascript HTML5 too. The output of his generator is basically a function describes a certain movement of an object. Such movement may b