Why and how I created inlinehockeydb.com
September 09, 2019 by Tim van Heugten Website
As a young kid growing up in The Netherlands I was always attracted to 'American' sports, mainly hockey, because of the stats. Europe is all about soccer and soccer stats (especially back then) are all about goals scored. It's slowly changing, but no one back then was talking about assists or goals per game average in soccer. Stats got me interested in hockey and hockey got me playing inline hockey. Now both have come together in inlinehockeydb.com - my recent 'hobby' project. Here is a little background information on why and how I created it. Most of it is nerd stuff, but it is my site so I can publish whatever I want ;)
As to why I started inlinehockeydb.com there are two reasons. The first reason is that every sport that wants be regarded as a serious sport needs a proper online searchable database with stats. Now, for inline hockey that is not easy because the sport is so fragmented, but I thought I should try. The second reason is more egocentric. I've always loved developing websites and web applications, but since I moved into a management role I barely write any code anymore. I also noticed I never learned how to properly write code, use git, do deploys, etc. so this project is also about learning so I can better understand what my team at work does. As for now, this project is a non-profit / hobby project. I do not sell ad space, I do not need donations, and for now there won't be any premium account features like some other stat sites have. What I do need is help. Some data will be incorrect, some data will be missing, etc. Once the account feature is launched (yes, it's coming) people will be able to help make corrections. That will be the ultimate test: is the community willing to help grow the database?
The second question in the title is "How did I create inlinehockeydb.com?". You can skip this section if you don't care about this nerd stuff. I'll conclude this opening blog post with some features coming up and what stats I will harvest next.
The first step was to harvest, or to 'scrape', the stats. That in itself is a challenge for inline hockey. Leagues and tournament often change stats / league administration systems so historically there is not much available. My goal for a first version was to gather State Wars, NARCh and IIHF 'pro' stats. Obviously I was looking for stats that would 'overlap' in players. Only then it's cool to actually look up a player and see combined stats of multiple tournaments. I saved the stats on a 'tournament' or 'league' level. So Player A played X games that season and had Y ginos and Z apples. All of these stats I would save in separate .csv files. IIHF stats are published in PDF. So I used PDF table extraction software to convert them in .csv files.
Once ready I wanted to combine these stats. The thing different tournaments share is player names. But, for example IIHF uses 'LASTNAME Firstname' formatting and State Wars uses 'Firstname Lastname' formatting. A second issue is that players might have the same name, but be different persons. For that you need birth years (or even birth dates), which IIHF publishes, but State Wars and NARCh don't. Finally there is spelling mistakes, abbreviations and just different naming. C.J. Yoder, Charles Yoder, C. Yoder. I've seen them all.
To perform this step I used software called Open Refine. It was originally created by Google employees as a tool for researchers. It does all kind of cool stuff, but I mainly used it for string matching (clustering). I applied different string matching algorithms (Levenshtein, n-gram, phonetic matching), which all do one thing: give a score or chance that two names are actually referring to the same person. Then the final step: just human checking. Gerry and Gerald Osterkamp are the same person, we all know that. Steve Oleksy and Steven Olesky are the same person. For us humans this is so easy - for machines it's not.
When done, I had a list of unique players and there stats. Now it was time to create an actual application.
This application was created using Laravel, which is a PHP framework for creating web applications. It basically gives you a set of ready made features that are common for web apps, so you don't have to reinvent the wheel. Think easy database operations, user account functionality, routing (which url should do what), etc. Now, you still have to code the actual features, but Laravel lets you skip the boring stuff.
I developed the application in roughly two weeks and then needed to add search. A proper search functionality is really difficult. We are all used to using Google, but if you ever used search on someone's hobby blog or even some fairly large e-commerce website you know good search is not easy. The person that searches might also not know exactly how a name is spelled. That person might use a first name, a last name or both. That's why I chose to use Algolia - again to avoid having to do the heavy lifting. Algolia uses all kinds of smart features that enables you to find who you are looking for. So now you can search for 'Charles Yoder', 'CJ Yoder' or 'Joder' (in case you heard the legend's name, but don't no how it's spelled). They will all give you results.
The final part I developed was a reconciliation API. Basically a backdoor to the application I could use in Open Refine so when loading new stats, the system can tell whether a player already exists (and the stats should be added) or not (so the player should be created and then have it's stats added).
Quickly after I launched the site I wanted to link player profiles to their profiles on eliteprospects.com so people can easily check their ice hockey stats. EP.com has a ton of player profiles so often a name brings up a lot of results. I manually mapped dozens of profiles, I will continue doing some in the near future. The players that are currently in the 'PPG' overview (players with 30 or more games played) are all linked now.
Obviously a lot more stats need to be added. On the homepage there is a discussion about which stats should be added. Some are not up for debate, but simply not published online. Old NARCh stats, even older IIHF stats... I wish they were available, but they are not.
Then there is this blog. RDN (Roller Dad News) will help create content for it and I will write some pieces every now and then. It's main focus should be the tournaments / leagues and it's players (signings, trades, etc.). If people like it I will expand it, else it might be removed again.
Then teams. Right now they are not linked, but that's something that's definitely on the list. Sometimes teams can't be matched by name, but we all know it's the same team (ie. Alkali RPD vs Alkali Revel or Revision Vanquish and Revision Recoil). We have to come up with a set of rules to define when a team is the same team and then group them. After that we can assign medals (gold, silver and bronze) for each tournament/league to a team. We know what players were on the team so we can show medals per player but also show medals per team and maybe create an RDN style leaderboard for it.
Finally about the main future feature: user profiles. People will be able to register an account. Either by 'claiming' a player profile, or by registering as a new player. Then a set of features will be offered: to submit corrections, change your own bio data or add stats. Still working on the details, but you get the picture.
Guess that's enough for a first post. If you have any stats, ideas, suggestions, questions or whatever you think is valuable to me, please send it to firstname.lastname@example.org.
If you made it all the way to this point: thanks for reading and hope you enjoy browsing the site.