Adjust Font Size: A A       Guest settings   Register

KBO English Data Project

Discussion in the KBO News forum
KBO English Data Project
I've gotten the impression from a number of recent posts that there's a growing need for Korean data in English, but that there is no organized effort to collect it and/or make it available. I'd really like to change that.

I can't read Korean, so I'm of very little help in regard to translation. However, I will make what resources I have available to anyone who would like to participate in a project with the following goal:
To produce a multi-lingual database of current and past KBO player statistics.
I've taken the first step to this goal by producing two spreadsheets:
  1. 2006 KBO Batting
  2. 2006 KBO Pitching
Both files are in Open Document Format. You can use Neo Office on Macs, Open Office on *NIX and Windows, KSpread on *NIX, or several other ODF supporting applications on many environments. (All of the above are free, Open Source applications. There are also commercial alternatives and plugins for MS Office.)

If you're looking for English data right now, I'm afraid that the above two spreadsheets won't help you much at the moment. In fact, if you just want to consume the data, please don't reply to this post. That's not what this project is about.

If you have some time and can read/transliterate Korean, I ask that you consider helping out the world-wide KBO fan community and join this project. To do so, just reply to this message stating what you can do to help and (if you're not registered with a valid e-mail address) your e-email address. I'll delete e-mail addresses before approving any posts (unless otherwise stated).

As for the project structure, if it's just a couple of people, we can CC everything in e-mail and I'll post the progress to this site. If there are more volunteers, I'll create a mailing list specific for the group.

If I get no volunteers, well, I tried.
Comments
Re: KBO English Data Project
[ Author: NelsonChung | Posted: Feb 17, 2007 7:38 AM | NIP Fan ]

I help translate some of the stats. As for transliterating names, I'm not familiar with Korean-Romanization systems. The problem discussed earlier is that the KBO data is incomplete, with no CS, SF, or SH stats. SF can be calculated, and SH deduced, but CS? no where to be found.
Re: KBO English Data Project
[ Author: Guest: Michael Eng | Posted: Feb 19, 2007 1:52 AM ]

Here are the career stats for all players listed as players at the KBO website (at GoogleDocs); Master, Batting, Pitching

The stats of retired players can be gotten from the site baseball2i.com. For instance, former BayStar and current Dragon Tyrone Woods has his KBO stats at this link and there are similar pages for other players as well. However, when I tried to download all files with a pcode of 70000 to 99999, I ended up crashing my computer. So, I was hoping that there is some sort of way that this data could be retrieved. Perhaps, a script that could go through that range (I believe all pcodes are above 70000) and download any of the pages that exist.

Listed as well are
Franchises, Award Winners, Standings, Half Standings, Managers (incomplete), Season Leaders, Career Leaders, Korean Series results at the BR Bullpen.

Hope this helps.

Michael Eng
Re: KBO English Data Project
[ Author: westbaystars | Posted: Feb 19, 2007 10:08 AM | YBS Fan ]

Wow, Eng-san, you did it again! Are these your sheets? Or did you come across them? These will make a much better base than the one I started with. Time to reset the KBO project from scratch.

Typing in the PIDs from the Google Docs into the Baseball2i site (pcodes) doesn't seem to bring up a match. The numbers are in the same range, leading me to think that they're related. Is it current vs. retired players maybe?

Oh, and off the Master table is a "Translit" tab with a Korean-Romaji transliteration map. This is an incredibly useful resource just for that alone.

Baseball2i doesn't have a robots.txt file, so I'll crawl it and see what I can do in the way of scraping data. But not today.

Thanks again for the great resource. Sasuga Eng-san.
Re: KBO English Data Project
[ Author: Guest: Dan | Posted: Feb 19, 2007 2:32 PM ]

Thomas St. John of the Korea times (no longer writing for them) was planning on having a KBO database. I don't think it ever got set up though.

When I have time, I will translate and try and update the top batting and pitching leaders at my forum. But it can be time consuming (especially now that I'm back in school).

For the 2007 season, I hope to keep up a little better. All I do is copy and paste the stats into an Excel spreadsheet and translate the names etc.

-dan

http://kbo.hyperboards.com
Re: KBO English Data Project
[ Author: westbaystars | Posted: Feb 19, 2007 3:07 PM | YBS Fan ]

- Thomas St. John of the Korea times (no longer writing for them) was planning on having a KBO database. I don't think it ever got set up though.

I've discussed that with him on a number of occasions, most recently just before I proposed this project. I had even offered to help him with programming and resources here for his project a few years ago, but he wanted to hire a local person where he could have more control. As you said, though, things never got off the ground. (In fact, the KBO itself put up barriers to his effort, so he finally gave up.)

- When I have time, I will translate and try and update the top batting and pitching leaders at my forum. But it can be time consuming (especially now that I'm back in school).

Well, what can I do to help automate that for you? If you've got URLs and a transliteration table, I can automate the cutting and pasting part, and put everything into a relational database and/or spreadsheet. It's that name transliteration table (people's, teams', etc. names) that's the key.

Finally, I was thinking of making the KBO (and CPBL) into separate, more primary forums here. However, I don't mean to encroach on your territory. Should I start sending people over to the KBO Hyperboards who have questions about Korean baseball? What can I do to help?
Re: KBO English Data Project
[ Author: Guest: Dan | Posted: Feb 19, 2007 3:19 PM ]

Well, since more people visit this site, I have no problem with you setting up a KBO forum. My site was originally made for me and a few friends while we were living in Korea. Now that we are back home, it's a bit harder to keep up with the KBO. So my site doesn't have quite the frequent updates that I wish it could have.

Well, when I translate the names and stuff, I just usually go one by one - that's what takes so much time. I'm currently trying some Excel program that will go through and find/replace.

But yeah, I'd be more than happy to help you guys out. I also contacted a few agencies in Korea about obtaining the KBO statistics, so just waiting to see what they say.
Re: KBO English Data Project
[ Author: Brimsek | Posted: Feb 19, 2007 10:56 PM | HNHF Fan ]

- Finally, I was thinking of making the KBO (and CPBL) into separate, more primary forums here. However, I don't mean to encroach on your territory.

I don't want to post for the sake of making a post - or to get this off topic. But I think separate forums for the KBO and CPBL would be great. I totally understand not wanting to encroach on anyone's territory, but if there are no objections then I hope this comes to fruition.

I lived in South Korea for 18 months (in the military) in 1985-86 and I remember the KBO was on TV sometimes. I used to watch the occasional game despite speaking only passable Korean. However I don't have the language skills to add to this project unless you just need a gopher to track down some info.
Re: KBO English Data Project
[ Author: Guest: Michael Eng | Posted: Feb 22, 2007 1:01 AM ]

Westbay-san,

Yes, I created the tables from information from Naver.com and the KBO site.

I should have stated that the playerid/pcodes in the master table will not work with the baseball2i site's retired page as these players are not retired. For instance reigning Most Valuable Player/Rookie of the Year Hyeon-jin Ryu's pcode is 76715. Since he is not retired, placing that number as the pcode turns up nothing - his page is here.

So a pcode from 70000 to 99999 that is not listed in the master table may be that of a retired player, and all pcodes listed as playerids in the master table are that of active players and will turn up blank or error pages.

I hope that makes sense.

Michael Eng
Re: KBO English Data Project
[ Author: Guest: Dan | Posted: Feb 28, 2007 5:38 AM ]

Well, just talked with the official KBO scoring agency and to obtain the the full statistics it would cost at least $20,000!

So, I guess I will have to continue translating names and using stats from the KBO site and the sports newspapers.
Re: KBO English Data Project
[ Author: NelsonChung | Posted: Mar 1, 2007 3:06 PM | NIP Fan ]

Such hogs. MLB data is free. Are the $20K stats any better than the ones available from the site and newspapers?
Re: KBO English Data Project
[ Author: Brimsek | Posted: Mar 2, 2007 11:49 PM | HNHF Fan ]

20K? What on earth for?

I'm just wondering if they gave a reason why it would cost so much money to obtain the statistics.
Re: KBO English Data Project
[ Author: Guest: Mischa | Posted: Apr 17, 2007 3:12 AM ]

I have a question about the identity of a pitcher. According to the Korean wikipedia entry for Seung-ho Lee (b. 9/9/81), he pitched for 2000-2005 for SK. This would fit the information available for the Olympic pitcher Seung-ho Lee and the 2000 Rookie of the Year Seung-ho Lee. The pitching line matches player ID 99137 in the database posted by Mr. Eng.

But - the pitcher leader list linked to here indicates that a Seung-ho Lee led the 2003 KBO in strikeouts while with LG. This matches player ID 70820 in the database, who pitched only for LG from 1999-2006.

So - were there two Seung-ho Lees in the KBO in this period?
Re: KBO English Data Project
[ Author: Mischa | Posted: Apr 19, 2007 6:39 AM | TYS Fan ]

I got confirmation from a couple sources that two strikeout pitchers named Seung-ho Lee were contemporaries in the KBO for the six years.
Re: KBO English Data Project
[ Author: Guest: Dan | Posted: Apr 28, 2007 6:45 AM ]

westbaystars,

How's the project coming along?

Also, how does one go about making such spreadsheets as you and Mr. Eng have?

I have bought a new domain name for my forum, so it can now be found at www.MyKBO.net.

I'm not really good at excel and the like, so how would one go about making such sheets as you have?
Re: KBO English Data Project
[ Author: westbaystars | Posted: Jun 4, 2007 10:53 PM | YBS Fan ]

The KBO Project has been a bit stagnant, partially because I've been busy with other things. With the recent influx of KBO questions, I'd like to start to get this project going. But I need a lot of help.

I think I've gotten all of the teams translated correctly for the KBO Standings table on the top page. But I wasn't sure what all of the percentage columns were. If somebody could please compare that table with the official KBO standings, I'd greatly appreciate it.

Once the standings have been translated, I'll move on to the batters and pitchers.
Re: KBO English Data Project
[ Author: Guest: Haitai | Posted: Jun 5, 2007 2:46 AM ]

KBO published "The Official Baseball Encyclopedia 2007" in April. It is an e-book. You can read it free.

It has year by year team standings, league leader boards, all of the players' career stats, etc.

Even though it is written in Korean, you don't have to pay $20K.
Re: KBO English Data Project
[ Author: Guest: baekgom84 | Posted: Nov 7, 2007 9:23 PM ]

Hi westbaystars,

I've been thinking about the KBO data project and would like to contribute. I can translate and romanize a lot of the Korean involved in the stats, etc. But since I'm actually relatively new to baseball, I'm not entirely sure what sort of stats people might want to access. I've already compiled a database of some of the leading pitching and batting stats for the 2007 season, and I'm working on updated player rosters for each team. But so far it's all a bit aimless, so it would be great if you could give me some direction.

Let me know your thoughts.
Re: KBO English Data Project
[ Author: Guest: Michael Eng | Posted: Aug 7, 2008 10:55 PM ]

[Old thread revival]

A site called inning.co.kr has/had player stats of what seemed every KBO player (example) presumably from the Encyclopedia mentioned above. I don't know why the stats might have been taken down, but it seems that there would be a good place to start. It has a BBS but I think you need a resident registration number, which I don't have. At least the data is out there, digitized.

Hope this helps.
Michael Eng

More info: The site operator might be someone named Ki Hotae who also runs a New York Mets fan site.
Re: KBO English Data Project
[ Author: mondejoe | Posted: Aug 10, 2008 6:59 AM | YOK Fan ]

The statistics have been moved to here.
Re: KBO English Data Project
[ Author: Beemer | Posted: Feb 1, 2009 3:40 PM ]

I realize that this is an old thread, but I just need to add this. Michael Eng had requested that I post this here.

After aggregating all the data available from this thread, it didn't seem to cover the full scope of the KBO. A few years back, I remember that most of the KBO stats were locked up by the league, which was charging large fees to get the stats.

Since then, I discovered (through links on these very threads) that some sites have put up much of the historical data. One site seemed to have everything from 1982 to present, so I was fortunate enough to capture all the data before the site's untimely demise.

I'm quite certain it isn't perfect, there are probably many errors and I don't speak or read Korean. But I am tenacious. And now, I can pass on my work to the community at large, who can helpfully strengthen the database.

Here is the first version of the KBO database:
Access 2003 version
Excel 2003 version
Re: KBO English Data Project
[ Author: westbaystars | Posted: Feb 3, 2009 11:02 PM | YBS Fan ]

Beemer,

I'd like to thank you for the work you've put into collecting all of this data. I've merged your names with mine in my 2008 KBO data record set. However, I have a few questions about the data.

First of all, what was the original source for the data? I hadn't found fielding data anywhere (or if I had, I didn't know it). I gathered data from Naver and the official KBO site.

Secondly, how did you transliterate the names? I've been using Michael Eng's ROK Passport table as a base, but know that many players don't write their names that way. Furthermore, some of the ways that you transliterated names are different than what I found at the Samsung team web site (which don't follow the ROK Passport transliteration scheme either).

Third, how is Eng-san doing? I haven't heard from him in a while. Is your work an extension of what he had started a year or two back?

Anyway, thank you again for the contribution. It's been very helpful for me, and I'm sure others interested in the KBO as well.
Re: KBO English Data Project
[ Author: Guest | Posted: Feb 4, 2009 2:08 AM ]

I found the data at http://inning.cafe24.com/bbs/inn_season.php [no longer working], which seemed to just have everything. There was a way I could get the site to just give me huge data dumps with hundreds of record and thousands of data points at once. Alas, the site went down, but not before I captured every last bit of data they had, including that fielding data.

I used several sources to transliterate names, and mainly I used the transliteration to identify non-Korean players. (I took Japanese, mostly lost by now, in college, so I know of the concept of loanwords and can easily pick them out.) So my Korean names, I'm certain could have very dubious transliterations.

One tool was http://sori.org/hangul/conv2kr.cgi">Hong's Hangul Conversion Tools.

I also used a Wikipedia article on Korean Family Names to build something similar to the Eng's ROK. (I had it, but had forgotten about it.) Whenever I came across a Hangul character that wasn't in the Wiki page, I looked it up on Hong's Hangul page and added it to my transliteration list.

So take any Korean transliterated names with a huge grain of salt. I know as much Korean as I know Mandarin, which is effectively none. Although I do remember from my time studying Japanese that converting from Hangul/Kanji/Hiragana/Katakana to the romanization is more an art than a science. People have their preferences based on which romanization they like better. So you'll see OO or OU or O depending on who is translating, and as an English speaker, you just deal with it. (Feel free to correct me if I'm off base here.

I only suspect that the masked man is Michael Eng. Someone on the MyKBO board came to my rescue with a lot of foreign players. It's his only post, and his user name is meng. And he asked me to cross-post here when I was ready to post my DB.

I believe my work incorporates a little of your work, a little of his and a little of mine (primarily all on the name data). The numeric data itself came wholly from inning.cafe24.com.

I've been a fan of non-US leagues for more than a decade now (mainly a fan of available data, and knowing which US players are where), and been frequenting this site for over ten years as well, so I'm glad to contribute.

Obviously with the rise of the Internet, current stats are easier to come by than the 1980s stats for the KBO, so when I saw the opportunity to get a good feed of that old data for the community at large, I made sure I nabbed it (and fortunate since the site is now down). So hopefully, that data proves to be sound, and it can be a good foundation to build on top of for all the future years of the KBO.

I'm a huge open-source guy, so it always feels good to give back to a community that you love, which in this case is baseball in general, and international (i.e. non-US) baseball more specifically. And it's my hope that anyone who wants to can help build on it.

Hopefully with more eyes on the database, especially with people more familiar with Korean, we can all make it a strong DB, like Lahman's MLB database.
Re: KBO English Data Project
[ Author: Beemer | Posted: Jan 8, 2010 6:42 AM ]

Most of the sites that I used last year for KBO stats are down, seemingly for good.

The official KBO site has some data: Hitting and Pitching. It's missing Fielding (with the exception of errors) and it's also missing Games Started for Pitchers.

Is there someone who can read Korean that might be able to track down a new repository of data for 2009?
Re: KBO English Data Project
[ Author: mondejoe | Posted: Jan 9, 2010 5:55 AM | YOK Fan ]

You can use this website:
http://www.statiz.co.kr/player

They have fielding statistics but for batters I don't see any 2B or 3B stats.
Re: KBO English Data Project
[ Author: westbaystars | Posted: Feb 9, 2010 10:49 AM | YBS Fan ]

This project is still alive and kicking. Bryan Walko has compiled a set of 2009 KBO data here:The two ZIP files are Microsoft-only formats, so use them at your own risk.

The first file is a collection of 6 spreadsheets in a single OpenDocument Spreadsheet file:
  • master
  • franchises
  • teams
  • pitching
  • batting
  • fielding
Columns have been nicely formatted in the ODS version, and can be viewed/edited with Neo Office on Macs, OpenOffice on *NIX, Intel Macs, and Windows, KSpread on *NIX, or several other ODF supporting applications on many environments. (All of the above are free, Open Source applications. There are also commercial alternatives and plugins for MS Office if you're so inclined.) As the data in OpenDocument is all XML based, it can also be accessed using native XML databases such as eXist. (Please feel free to drop me a line if you're interested in consulting along those lines.)
Re: KBO English Data Project
[ Author: Guest: Michael Street, Asian-American Sports Examiner | Posted: Feb 10, 2010 4:52 PM ]

Thank you, thank you for all your hard work on this valuable resource. I write on Asian-American baseball players for several websites, and can't find KBO data anywhere! Last season, I used Google Translate on the KBO site to pull recent data (to help me write articles, but not to create a database).

This is truly an amazing resource. I've posted a question in a new post about a player I can't find, and may have other questions as I dig deeper into this.

Thanks again!!
Re: KBO English Data Project Walk Conversion
[ Author: MichaelStreet | Posted: Feb 11, 2010 7:42 AM ]

Comparing this data to the info found on the KBO site, I notice a discrepancy.

The statistic "4사구" (translated as "449" by the sometimes-helpful Google translator) seems to refer to walks, but the numbers in the database are consistently lower than the ones on the KBO site.

Does the KBO calculate walks differently, or does "4사구" include things like hit batsmen or IBBs? In other words, did you calculate the walks differently in the database?

Thanks!

Mike
Re: KBO English Data Project Walk Conversion
[ Author: Beemer | Posted: Feb 13, 2010 2:34 AM ]

All my walk data for 2009 came from http://www.statiz.co.kr/player. I generally used the KBO data for doubles and triples and a few other things that statiz left out. (The main pre-2008 data came from the now defunct "innings" site.)

My translation of 사구 is HBP. It's possible that 4사구 is something that combines HBP+BB, because I have no idea why "4" would be concatenated with HBP. I come with absolutely no knowledge of the Korean language though, so any help is appreciated.
Re: KBO English Data Project Walk Conversion
[ Author: Beemer | Posted: Feb 13, 2010 2:53 AM ]

Yep, I just checked a few pitchers. If you look at their career data, I can see that 4사구 is equaling BB + HBP.

Which is nice, because there was no HBP data for me this season for pitchers. This may allow me to get that data, since I have 4사구 and I have BB.
Re: KBO English Data Project Walk Conversion
[ Author: Beemer | Posted: Feb 13, 2010 3:07 AM ]

On the downside, the first person I used to check was
Yang Hyeon-Jong. I found out that I was missing his 2008 season data. So bear in mind that this is still a big work in progress.
Re: KBO English Data Project Walk Conversion
[ Author: westbaystars | Posted: Feb 16, 2010 9:32 AM | YBS Fan ]

In Japanese, walks and HBP are often combined (as 四死球 - literally "4 and dead balls" since 四球 "4 balls" is a walk and 死球 "dead ball" is hit by pitch. As Korean is in the same language family as Japanese (the same way Spanish, French, and Italian are all Latin languages) and Korea has much of their baseball heritage from the Japanese occupation in the early 20th century, it wouldn't surprise me if 4사구 literally meant something similar.
Re: KBO English Data Project
[ Author: KJOKBASEBALL | Posted: Jan 19, 2011 6:38 AM | HT Fan ]

Any chance that someone was able to get 2010 KBO stats?

THANKS,
KJOK
About

This is a site about Pro Yakyu (Japanese Baseball), not about who the next player to go over to MLB is. It's a community of Pro Yakyu fans who have come together to share their knowledge and opinions with the world. It's a place to follow teams and individuals playing baseball in Japan (and Asia), and to learn about Japanese (and Asian) culture through baseball.

It is my sincere hope that once you learn a bit about what we're about here that you will join the community of contributors.

Michael Westbay
(aka westbaystars)
Founder

Search for Pro Yakyu news and information
Copyright (c) 1995-2024 JapaneseBaseball.com.
This work is licensed under a Creative Commons License.
Some rights reserved.