Journal

AI, Journal

Importing Wikipedia dump into mysql

Reading Time: < 1 minute

So I was thinking about using Wikipedia data to make a knowledge base and practice some NLP techniques on it. The first step is to import the English portion of Wikipedia into a mysql database so I query it as needed.

The first thought is to go to the Wikipedia download page.

I first tried to download the already made sql, but those SQL script available to download doesn’t actually include the text we see on Wikipedia. So I have go to the XML files, and follow instructions provided by this source.

Basically, we need to use a tool called MWDumper, that will convert XML into SQL scripts. We can download the compile java here, with the instructions here.

This code provided by the blog are mostly correct, except table page have one more column. All we need to do is to add the column like this:

ALTER TABLE page
ADD COLUMN page_counter INT AFTER page_restrictions;

Another change is that one of the column in revision is too small, so we need to change the field property.

ALTER TABLE `revision`
CHANGE `rev_comment` `rev_comment` blob NOT NULL AFTER `rev_text_id`;

There are also duplicate page_titles in page, so make sure they are not set to UNIQUE

ALTER TABLE `page`
ADD INDEX `page_name_title` (`page_namespace`, `page_title`),
ADD INDEX `name_title` (`page_namespace`, `page_title`),
DROP INDEX `page_name_title`,
DROP INDEX `name_title`;

After that it should just be a waiting game until everything is done. My slow server took about 2 days. The final size is about 126 GB on database. Happy NLPing!

Journal

Install CUDA driver on a new Ubuntu system

Reading Time: < 1 minute

After recycling the old pc to install Ubuntu, I wanted to install CUDA drivers. And of course ran into the same old errors. Here is some notes to make sure I don’t run into the same errors again.

  1. Download and installing CUDA driver. Need to install make and gcc, sudo apt install gcc make
  2. There are still some errors form cuda install, as directed by nvidia forum,https://forums.developer.nvidia.com/t/info-finished-with-code-256-error-install-of-driver-component-failed/107661, need to look at the file on /var/log/nvidia-installer.log. It’s because The Nouveau kernel driver. Now follow the nvidia instruction at https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#runfile-nouveau to blacklist Nouveau.
  3. How come it doesn’t work? Well, need to reboot the system of course.
  4. Now it installs correct. Have fun!
  5. Make sure to use the entire harddrive if you have fresh Ubuntu install. Here -> https://www.panzoto.com/extend-the-free-space-on-lvm/

Journal

Recycle old pc to be a server

Reading Time: < 1 minute

After more than 10 years of using my old pc, I finally decide to get the a new one. It’s black, and powerful. But enough of that. I decided to recycle my old pc to be a linux server. After installing Ubuntu 20.04, I decided to partition the old hard drive. And here is guide I followed. It even shows you how to fuse multiple hard drive together to behave like a single drive. Enjoy!


https://techguides.yt/guides/how-to-partition-format-and-auto-mount-disk-on-ubuntu-20-04/

Journal

Acadia National Park Trip Recap

Reading Time: 3 minutes

Recently I had a family trip to Acadia National Park. It was an interesting journey, especially in the mist of Covid 19 pandemic.

We first made a stop at Portland, ME since many friends and co-workers have stopped there for the summer. It’s a small city with a Key West vibe. We stopped at the old port as people recommended, and took an afternoon walk along the shops near the bay. There were several lovely pottery shops, but we weren’t such big fans of pottery. It was a specially hot day, so it’s interesting to see shops with blowers running. If you can’t tell by now, I live in Florida for a long time. Air conditioner is standard in Florida, so it’s strange in my mind to not have A/C in the summer. My son loves rocks and there was a gem shop with colorful rocks. We bought a few bottles of small gems, and he was quite happy. Some of the restaurants were busy, and we couldn’t find a table until late at night, so we stopped at a smaller one just to get a bite. I would say the highlight of the afternoon is the ice cream shop with mocha and tiramisu.

Official trip to Acadia began the next day. Only noticeable thing on the highway is that I had two incident of people driving off the road almost immediately in front of me. I just never seen it in the many years of driving in Florida. People hit each other’s cars, but they don’t just drive off the road. Maybe Maine have different driving regulations, or many people in Maine have special training to avoid cars, by driving off of the highway.

Once we get to the hotel, the staff were extra nice, recommend us the sites at the national park, and offered to print vehicle pass since I didn’t have any. It was tough to get the non-operational wifi to work, but the helpful staff made it sufferable.

Acadia National Park is as expected for any national park in the peak season. There were a lot of people. I rarely see that many people in the U.S. outside of large cities. The Sand Beach was especially bad since there was miles of one way road, and the only way you can find parking is to loop back after driving 30 minutes. Otherwise it’s about equal time to walk after finding parking. Because the daytime entrance window is only 30 minutes from the booking time, I had to really try hard to get through the area. I don’t remember much since I just dropped off my family and never stayed the beach.

Cadillac summit road was interesting to drive on. The only comparison I have is driving on Smoky mountain roads in the Carolinas. Unless you are used to the mountain roads, speed limit is your friend. It’s not like the road is extra narrow, but the feeling of imminent cliff diving made it more challenging for my shaking legs. Anyway, the glacial rocks were extra fun to walking/hiking on once we got to the peak of the mountain. You can almost walk anywhere on the peak. Around half of the rocks were covered by trees, so most of the peak is surprisingly accessible by walking. Large groups of flies was sort of annoying if you have kids. And although it’s rarely sunny near the peak, extra sun screen is recommended. I had sun burn with regular amount of sun block.

One extra note is that most people around our hotel wore masks, but the visitor to the national park most choose to not wear one. I had one feeling that’s because local population were older than the visitors. But it could also be the local culture is much more acceptable for wearing masks.

Overall, it was a fun relaxing trip. We probably had way too much lobster, even though we are not such fan of that much protein.

AI, Journal

Inspired by Github Copilot and What Makes a Good Programmer

Reading Time: < 1 minute

Recently Github starting to send out invites for Copilot. It’s a AI assisted code generator for several different languages. For python, it will generate efficient code according to docstring the programmer wrote. For other languages, it will infer from the function declaration. I tested on Leetcode, and the time and space complexity is quite good. Although it struggle with some of the hardest tasks, it’s fulfills the promise it claims to do.

Should you use it though? The way the model is trained, it uses docstring and public available code. There is the obvious licensing issues. Can you use someone’s code, if they did not explicitly state it’s open source, even if it’s a public repository. There have been cases already discovered that have personal info in the comments of code, or in the embedded HTML. That makes people think twice about using it if they might sued later.

Another point is should you use it even if it’s legal. For now, it only generate a single function. I haven’t seen it write a complete class or generate scripts with folder structure. When the program gets more complicated, a lot of the higher CS concepts like cohesion, coupling, and usage of design patterns are more import than writing an efficient function. Therefore, I would put this as a tool for beginner to learn programming than an actual tool for advanced programmer to deploy. I have been learning and debating about when to use object oriented programming and when to use functional programming. I found the following resources to be helpful. For now, I’m still in the camp of learning better structure than blindly using Copilot to generate programs.

#ArjanCodes channel on Youtube: https://www.youtube.com/watch?v=ZsvftkbbrR0&list=PLC0nd42SBTaNuP4iB4L6SJlMaHE71FG6N&index=7

Python 3 Objecte Oriented Programming (book): https://www.packtpub.com/product/python-3-object-oriented-programming-third-edition/9781789615852

Journal

Solving problems in practice

Reading Time: < 1 minute

Several weeks ago, I was watching a Vox video on Covid-19 vaccine distribution. The story was about how rich countries were first in line to get vaccines because they made individual deals with the pharmaceutical companies to invest in research, to guarantee they receive the first batch of vaccines. There was an organization formed for many countries to chip in, so the poor countries can also get vaccines. But the way it worked out is that rich countries both contributed to the organization and made deals with pharmaceuticals companies, since they have the money, so they still end up being first in line for vaccines.

The video ended by wondering why it didn’t work, but fall short to indicate that human behavior is the reason that the original plan didn’t work. When working with any human-generated data, we are required to look at how that data was generated. Sometimes, people approach the data from an objective approach and ignored the human factor. But for many years, humans have greed and desires. An there often isn’t a more deep explanation than, “I want that because I can”. People like to seek reason and wish there is an logical explanation, but we can more often approach from the angle that sometimes people do things for no reason.

Journal

Extend the free space on LVM

Reading Time: < 1 minute

I had to reinstall my ubuntu 20.04 server because a boot issue became impossible to fix. After installing the OS, I found that df -lh only give me about 200 GB of space. I had more hard drive. With a little search around, I found that the disk is mounted, but I meant to extended to be more flexible. I just want all the space there.

This post gave a pretty good description of the problem and what to do, but here is commands I used specifically. This post showed how to extend all the free space, without specifying the exact amount of space.

sudo vgs

  VG        #PV #LV #SN Attr   VSize  VFree
  ubuntu-vg   1   1   0 wz--n- <2.73t 2.53t

sudo lvextend -l +100%FREE /dev/ubuntu-vg/ubuntu-lv

  Size of logical volume ubuntu-vg/ubuntu-lv changed from <2.15 TiB (563200 extents) to <2.73 TiB (714879 extents).
  Logical volume ubuntu-vg/ubuntu-lv successfully resize

sudo resize2fs /dev/ubuntu-vg/ubuntu-lv

resize2fs 1.45.5 (07-Jan-2020)
Filesystem at /dev/ubuntu-vg/ubuntu-lv is mounted on /; on-line resizing required
old_desc_blocks = 275, new_desc_blocks = 350
The filesystem on /dev/ubuntu-vg/ubuntu-lv is now 732036096 (4k) blocks long.

df -lh
/dev/mapper/ubuntu--vg-ubuntu--lv  2.7T   11G  2.6T   1% /

Journal, Misc

Server failed to boot after power loss

Reading Time: < 1 minute

So my ubuntu server failed to boot after a power outages. It just went into the boot menu. But no matter what I press, it have some error related to “need to load kernel first”. I went around with the error message, but none of the suggestions worked. Even using all the tools with boot-repair. Finally went back to this post. https://askubuntu.com/questions/397485/what-to-do-when-i-get-an-attempt-to-read-or-write-outside-of-disk-hd0-error

Most of the suggestions are correct, except this part ->

Grub> initrd /initrd.img

It happened that I don’t have /initrd.img. It maybe got removed somehow. But if I do initrd /boot/initrd.img* then I was able to boot. I will see if I have any more errors related to this.

AI, Journal

What to do if Ubuntu doesn’t wake up after sleep

Reading Time: < 1 minute

So I recently updated my linux machine to Ubuntu 20.04 and it didn’t wake up from sleep. The behavior is black screen and a restart after about 10 minutes of wait. The solution I found that worked is this -> https://askubuntu.com/questions/1298198/ubuntu-20-04-doesnt-wake-up-after-suspend

Hopefully that solves your problems! Anniversary Edition!

Journal, Misc

Chinese translation of western countries makes the Chinese love them more

Reading Time: 2 minutes

Back in the 1980s, there was a huge rise in the number of educated Chinese wants to immigrate to western countries. I think it’s partially due to how Chinese translations of some of the western countries. Even though the names have little to do with the countries or culture, it created a fantasized version of the country in the hearts of many Chinese people.

For example, the United States is translated as the “beautiful country”. France is translated as the “lawful country”. Germany is translated as the “honor country”. England is translated as the “handsome country”. Italy is translated as the “meaningful country”. Canada is translated as the “plus country”. Sweden is translated as the “intelligent country”. Ireland is translated as the “love country”. Czech is translated as the “fast country”. Of course, I romanticized a little bit here and added some of my personal touches, but the idea is not far from what people thought.

There are some that’s more on the quirky side. Like Span is translated as the “west country”, and Portugal is translated as the “grape country”.

These translations are due to the fact that Chinese characters have different tones for each pronunciation. Although differentiable by a native Chinese speaker, each tone does sound rather similar. And for each tone, there are many characters that sound exactly the same. For example, in English, the word “beat” can mean both sound rhythm or hit something. But you can expect upwards of tens of words that sound exactly the same in Chinese. So if you see a Chinese person with the same first or last name, you can probably expect tens of variations in Chinese, even though the Romanized spelling is the same.

Because of translation is mostly done by sound and each sound in Chinese could mean many different things, people often choose the word with the best meaning for translation. So the resulting country names are often carry well intentioned meaning. Because these western countries are the first contacted Chinese culture, their country name translations often took the best translations. Compare to African countries or Latin America countries, there are rarely any well meaning translations. They often just trying to stay on the non-offensive side.

View More