Kenyt Product Ranking Engine

Kenyt was started with a goal to help buyers make quick and informed purchase decisions. To achieve this, we envisioned a knowledge platform which can process available information to extract useful knowledge and a product ranking engine which can rank products using this knowledge. Like Google gathers all the web pages and give you top 10 for any query, we wanted to rank all products to produce top 10 products for your needs. I am happy to say that we have cracked both of these pieces and now have an awesome knowledge platform and a product ranking/recommendation engine. In this post I am going to talk about factors we use in our product ranking engine.

Existing solutions provide options to sort by popularity, ratings etc. which does help a bit but still require you to do lot of work. Our engine looks at many parameters together to give you top 10 list tailored just for you. You can get “Top 10 products by brand”, “Top 10 by price”, “Top 10 by availability on a particular site” easily. Heck, we can even tell you “Top 10 Laptops with i5/i7 processor, 8+ GB ram having backlit keyboard” or “Top 10 Mobiles from Samsung, Lenovo, Honor with 8+ mp front camera in 10-20K price range“. Any Top 10 list is possible with Kenyt. Please note that this list is completely data driven and is always up to date unlike other lists which become outdated quickly.

Lets talk about the factors which Kenyt ranker uses to rank the products.

1. Feature strength: We evaluate all the features of a product and assign a spec score. To do this, we look at specs and compare it with other similar products to measure how good are its features compared to its competition. Not all features are considered equal. For example screen size is much more important than a USB port in TVs. Kenyt Spec Score for each product reflects our ranking of its specifications.

2. Value for money: If a product offers more features at lower price, it gets higher “value for money” score. This enables us to rank products offering more value for money higher.

3. Product ratings: This is typical star rating and ratings count. We assign some weight to rank products with more ratings and better star rating higher.

4. Recent popularity: Customers are influenced by what others are buying. Knowing what is popular help buyers choose more confidently. We guesstimate every product’s recent popularity using various data points like number of search queries, page views, purchase volume etc. and rank products popular in last one week/month higher. Many sites give you option to sort by popularity. At Kenyt you can even see exactly how popular is a product.

5. User Reviews Sentiment: Customers rely heavily on user reviews to get unbiased insights from existing buyers. To accommodate this, our engine go through the user reviews and automatically find aspects people talk about in a product. It then assigns scores to various aspects based on polarity of each sentence (and not use review level star rating). These aspect sentiment scores are then used in the product ranking. All aspects are not equal. Aspect weight is decided depending on its importance. For example, battery and camera reviews are given more importance than weight of a mobile.

6. Brand popularity and quality: Brand is another extremely important aspect involved in purchase decisions. Customers would choose product from a superior brand if everything else is same. To reflect this in our ranking, Kenyt calculates brand popularity and brand strength in different price bands. We use brand search volume, page views etc. to guesstimate popularity and calculates a brand score. We also look at average rating of all products and reward brands with multiple good products higher than ones which has few good but many bad products. Products from higher ranked brands are ranked higher.

All these factors are used to calculate a final Kenyt Score for each and every product. Our top 10 is list of products sorted by this score. Yes, there are other factors which are involved in decision making but even with these our rankings are fairly good. We will continue adding more signals to make our ranker even better. Further we are looking at making these recommendations personalized.

Product rankings can be only as good as the quality of the data. Many times data is incomplete, incorrect and inconsistent. Challenge is to be able to rank with all the imperfections in the data. I will talk about our data collection/extraction challenges in next post and give details of problems we solved to reach a point where we are able to do such deep analysis of hundreds of thousands of products from many sites in 52 categories with just 4 people.

You can download Kenyt Android app from here. Our iOS app is under development and will be released next month.

-Kanwal
P.S. Interested in working on simplifying decision making with us, send me a message.

Podcast on Running PHP on Windows

Peter Laudati & Dmitry Lyalin host the edu-training Connected Show developer podcast on cloud computing and interoperability. I met Peter in Chicago during Tek-X and we along with Don and Wade recorded a podcast on evolution of PHP on the Windows Platform. We talked about the improvements made to Windows in order to support PHP, including: Fast CGI, WinCache, PHP 5.3, the Web PI, and the SQL Server driver for PHP. Check out episode #31, “PHP On Windows” on Connected Show. Interview starts at 24:31.

CLICK HERE TO LISTEN!

If you like what you hear, check out previous episodes of the Connected Show at www.connectedshow.com.  You can subscribe on iTunes or Zune.  Peter and Dmitry publish new episodes approximately every two weeks!

Thanks,
Kanwal

Impact of name resolution on mysql_connect perfomance

I spend lot of time profiling popular PHP applications finding where exactly processing time is spent during PHP execution. Few months ago, one thing which caught my attention was performance of mysql_connect API. On my windows test bench, xdebug output showed that one call to mysql_connect was taking 0.31 seconds which is huge. While playing with the API, I noticed that when IP address of the MySQL machine is used instead of the hostname, performance of mysql_connect API is much better. Below is a simple script I ran to test this.
 
1. <?php 
2.     $hostname = "hostname";
3.     $ipaddress = "1.1.1.1";
4. 
5.     $starttime = microtime(true);
6.     for($cnt = 0; $cnt < 10; $cnt++) {
7.         $connection = mysql_connect($hostname, 'username', 'password');
8.         mysql_close($connection);
9.     }
10.    $endtime = microtime(true);
11.    echo ($endtime $starttime . "\n");
12.
13.    $starttime = microtime(true);
14.    for($cnt = 0; $cnt < 10; $cnt++) {
15.        $connection = mysql_connect($ipaddress, 'username', 'password');
16.        mysql_close($connection);
17.    }
18.
19.    $endtime = microtime(true);
20.    echo ($endtime $starttime);
21. ?>
Below is the output I got when I ran the above script on my test bench.

3.14003 seconds (time taken for 10 mysql_connect calls when hostname was used)
0.01396 seconds (time taken for 10 mysql_connect calls when IP address was used)

As you can see in the results, hostname resolution slows down mysql_connect significantly and the performance hit seems to happen on each mysql_connect call. This slowdown only happens when connection is made to a remote machine. When MySQL and web server both ran on the same machine, there was no performance difference between hostname and ipaddress case. So if you are running MySQL on a remote machine, use IP address in the mysql_connect call. Alternatively you can add an entry in %windir%\system32\drivers\etc\hosts file to tell your system IP address of MySQL machine. System will find this entry in the hosts file and pick the IP address of MySQL box directly without going through more expensive name resolution. Change hosts file only if you have a static IP address. Name resolution performance will depend on your DNS/WINS configuration and on your network topology. So you should run the script above to find out performance impact of hostname resolution before making any changes.

Hope this helps.
Kanwal

Using IIS configuration tools to manage HWC configuration

I have seen many people built some innovative solutions on top of hostable web core (HWC) functionality which was added in IIS7. One of the questions which HWC users frequently ask is how to make IIS configuration tools (like appcmd, UI etc.) work against configuration file which their HWC instance is using. So far they have been either doing manual modifications to HWC configuration file or making changes to IIS configuration and then porting over to HWC configuration. There is an easier way to do this but caveat is that this requires your HWC configuration file name to be applicationHost.config. If that is the case, you can use IIS shared configuration feature to setup a redirection to your HWC configuration file. Below are the steps to get this working.
 
          Open %windir%\system32\inetsrv\config\redirection.config.
          Change <configurationRedirection /> to <configurationRedirection enabled=”true” path=”c:\HWCInstance\” />
          Save and exit.
 
With this change, configuration system will map MACHINE/CONFIG/APPHOST to c:\HWCInstance\applicationHost.config and assume this as the main configuration file containing applicationPools and sites information. Now appcmd, UI etc. will work against HWC configuration file. Please note that setting up this redirection will affect your main instance of IIS as well. So if you are using IIS and HWC in parallel, this won’t be a safe thing to do. But if you are not using IIS main instance and only running your own HWC instance, this can be used to make IIS configuration tools work against your HWC configuration.
 
Hope this helps.
Kanwal

Goodness of dynamic maxInstances in FastCGI

One of the major pain points in FastCGI 1.0 was that users were required to set maxInstances to a number which works best for their application. Many users didn’t tune this value and were running FastCGI with default value of maxInstances which didn’t give them optimal performance. We recommended users to set maxInstances to (10 * number of processors) if they don’t want to go through the pain of finding the ideal maxInstances for them which worked much better than default value but still didn’t give optimal performance. Even when users fine-tuned maxInstances to ideal value, they didn’t always get best possible performance at all times due to variations in traffic, server/application configuration changes and code changes.
 
In IIS 7.5 we fixed this pain point by making FastCGI module capable of monitoring system state and adjust this number dynamically to give best possible performance. This feature is referred to as dynamic maxInstances or auto maxInstances and can be enabled by setting maxInstances value to 0. As we intend to maintain a consistent platform for PHP developers on WinXP and above, we have added this feature to FastCGI ISAPI 1.5 and made it available on IIS 7.0 as well (read more here). Last week I spent some time comparing dynamic maxInstances results with suggested maxInstances value (10 * processor count) results and below is what I got.
 
Hardware: Quad core machine with 4 GB RAM
Application: Joomla
 

 

 
MaxInstances = 40
MaxInstances = 0
%age difference
Path Length (less is good)
154177
142294
7.66%
Response Time (less is good)
844 ms
783 ms
7.22%
Context switches/sec (less is good)
171.76
79.69
-53.60%
Requests/Sec (more is good)
59.06
63.76
7.95%
 
As you can see, all performance parameters improved considerably just by changing maxInstances to 0. Our testing showed similar improvements in performance of other PHP applications as well. So we changed the default value of maxInstances to 0 in FastCGI ISAPI 1.5 and Win7. Note that if a value is explicitly set in fcgiext.ini or applicationHost.config, that value will override default value. Due to restrictions of QFE release process, we couldn’t change the default of maxInstances in IIS 7.0. So you are required to enable dynamic maxInstances after install this update on IIS 7.0.
 
We typically run many applications on our performance bench machines and were required to choose one maxInstances number for all applications. Choosing a different maxInstances number for each application was a better way to do performance testing but that was painful. Now with dynamic maxInstances, we get optimal performance for each application without any pain. Our performance testing is loving this feature and I hope you love it too.
 
Thanks.
Kanwal