There is no cell service on underground TTC Subway lines.

Offline App Usability Checklist

I started paying attention to offline usability because I was fed up with apps that have really bad offline usability issues.

Living in Toronto, whenever I wanted to use my tablet/cellphone to read something, I am often at a location with no connectivity. For example, I commute daily on TTC Subway and there is no connectivity underground. All food courts I go for lunch have no cell service, either. I get to experience all kinds of good and bad offline support in apps as a result.

When I say ‘offline usability’, it does not have to be about an app that supports offline mode. Apps with intrinsic online requirements can also have bad offline usability issues (look at the section “Bad Refresh Behaviour”). I would be happy if people read this post and think a bit more about this problem.

Determine the Requirement

You need to determine what needs to work offline versus what does not. There is no point supporting offline mode for something that intrinsically requires connectivity. The requirement highly depends on the kind of application you are building. Here are a few examples:

Email App

Email apps are perhaps the oldest and most mature of all offline capable apps. It has been there since the dial-up days, when always-on connectivity wasn’t a common thing.

What data is available offline

Recent emails in Inbox and other specified folders.

What can User do when offline
  • User can view offline emails, and attachments depending on the settings.
  • User can send an email. The email will be saved in Outbox, and the email will be sent when it is online.
  • User can view/delete items in synchronized folders. User can edit items in Drafts and Outbox folder. The action will be propagated to server when it’s online.
What settings can User change
  • How much to synchronize: 3 days, 5 days, 2 weeks, a month, and all
  • How often to synchronize: push, every 5 min, …, once a day, and never
  • What folders to synchronize: Inbox, Outbox, Sent, and other folders

News Reader

What data is available offline

Recent news (e.g., news in the last 24 hours) which is synchronized at some interval.

What can User do when offline
  • User can read recent news. Users can look at images attached to news articles.
  • User can share the link to social networks, which will be sent when online.
What settings can User change

How often to synchronize: every hour, 3 hours, 6 hours, every day, or never.

What not to do

Here is a list of common offline usability issues. This list is not inclusive.

Incomplete Data

The data available offline must be as a whole and useful form regardless of its internal data structure.

For example, consider a Contact app that only caches list of contact names and not their phone numbers because they are in different database table (or whatever reasons). Not only the contact names are useless to the user, it is frustrating to the user because the app gave a false impression that contacts are available offline. Most importantly, the app probably ruined the user’s business for the day.

In this case, it would have been better to cache all of user’s contacts and its related information. It is very unlikely that the data won’t fit on a modern smartphone. In the unlikely case of too many contacts, we could consider a concept of ‘favourites’ to synchronize.

Unpredictable Caching

Algorithms like MRU (Most Recently Used) may work for many things but it is largely inappropriate for caching offline user data (unless your user research tells you so). People just don’t remember a list of things recently accessed. There are simply too many factors in play to guess what the user needs offline.

It is best to have caching rules that are reliable and that humans can easily understand (e.g., none, last n days, all). Obviously, the content should be complete (see the section “Incomplete Data” above).

Bad Refresh Behaviour

This one applies to always-online application as well. Refresh should not invalidate the cache until a successful response has come back.

Here is an example of a bad refresh behaviour on Facebook for Android: One morning, I opened up a friend’s long status that I wanted to read but I had to leave for work. So I just turned off the screen of my tablet and got on the subway. The moment I turn on the screen, Facebook’s refresh on unlock logic kicks in, and blew up the status I wanted to read.

Another bad example is the Korean news portal Naver (news.naver.com). It does a periodic page refresh through location.reload(). If you lose internet connection for 30 minutes, you will find that all Naver news tabs turn into error pages.

Defying Users’ Expectation on Online Requirement

Apps should respect User expectations and mental model on whether particular function should be available offline.

For example, Users will not accept an email app that requires connectivity for writing a new email. Users of an email app expects the app to save the new email and send it whenever the connectivity is restored.

Shazam is a good example of an app that respects this principle. Shazam listens to a song and finds out what the song is for the user. When there is no connectivity, Shazam listens to a song and saves the recorded sound locally. When the connectivity is restored, Shazam sends and tags the saved sound.

Data Loss from Conflict Handling

Losing or overwriting data without user’s consent is probably one of the worst things you can do in information systems.

Suppose two users edited a contact, offline. When they come back online, the second save will cause a conflict. It is best if there is a contact merge tool. If there is not one, or if it is too hard to write one, the server could save the second saved contact as a new contact and let the user merge it manually. Just never overwrite the first edit, or lose the second edit.

Conclusion

Designing usable apps for offline use is not easy. You have to decide what needs to be available offline, and how & when to synchronize them. Offline requirements often have huge architectural implications, as well. Therefore, it is important to keep these items in mind at all stages of software development, from design to implementation and maintenance.

Debugging ARM without a Debugger 3: Printing Stack Trace

This is the last post in the series Debugging ARM without a Debugger.

This is an excerpt from my debugging techniques document for Real-time Programming. These techniques are written in the context of writing a QNX-like real-time microkernel and a model train controller on an ARMv4 (ARM920T, Technologic TS-7200). The source code is located here. My teammate (Pavel Bakhilau) and I are the authors of the code.


A stack trace is the ultimate tool that can help you tell exactly where a problem is occurring when used in conjunction with asserts (e.g. in my code, an assert failure triggers the stack trace dump. I also wired the ESC key to an assert failure).

It is particularly useful when you have a complex applications with deep call stacks. For example, if an assert has failed in a utility function such as stack_push in a complex application, it is practically impossible to figure out what happened where without putting print statements everywhere.

With a stack trace, we can re-construct the run-time call hierarchy and find out what is happening. At the end of this article, I will present an example of sophisticated stack trace output that can help us diagnose complex concurrency issues.

Stack Frame Structure

We can deduce the exact stack frame structure from the assembly code generated by the compiler (GCC-arm in my case). Here is an example of a typical function header:

func:
mov ip, sp
stmfd   sp!, {(other optional registers), sl, fp, ip, lr, pc}
sub fp, ip, #4
; function body continues...

The compiler will save the registers pc, lr, ip, fp, sl into the stack in that order. Additionally, the compiler may save any other scratch register used in the function. Important registers for printing a stack trace are pc, lr and fp.

Note that if any compiler optimization is turned on (e.g. -O flag), you need to pass the extra argument -fno-omit-frame-pointer. Otherwise, GCC will optimize out the code that saves the frame pointer.

pc (program counter)

Reading the saved pc gives us the address of the entry point of the function plus 16 bytes. This is because pc is always 2 instructions ahead in ARM when we save it.

lr (link register)

The lr register is the address to return when the current function returns. An instruction before lr would give us the exact code address of the caller of the current function.

fp (frame pointer)

This is the frame pointer of the previous function. We will need to read this in order to “crawl up” the call graph.

Stack Trace Crawler

Here is the pseudocode (or the actual code) for printing the stack trace:

// a poorly written macro for reading an int at the specified memory address x.
#define VMEM(x) (*(unsigned int volatile * volatile)(x))

lr = 0; depth = 0;
do {
   pc = VMEM(fp) - 16;

   // print here: the calling code is at lr, the current function addr is pc

   if (lr is invalid) break;

   lr = VMEM(fp - 4);
   fp = VMEM(fp - 12);

   if (fp is not a valid memory or depth too big) break;

   depth++;

} while (the current function is not a top-level function && depth is < some threshold);

Here’s an example code for reading the frame pointer which is required to start printing the stack trace:

#define STRINGIFY(x) #x
#define TOSTRING(x) STRINGIFY(x)
// reads the register reg to the variable var
#define READ_REGISTER(var,reg) __asm volatile("mov %[result], " TOSTRING(reg) "\n\t" : [result] "=r" (var))

int fp; READ_REGISTER(fp, fp);

The most important thing here is that you want this code not to fail. Here are common things that can happen that you don’t want:

  • Abort inside another abort (or, an abort inception; install a good abort handler to find out why)
  • Invalid pointer dereference (e.g. outside the physical memory, or outside .text region)
  • Stack overflow which will lead to another abort (by getting stuck in an infinite loop of crazy corrupt frame pointers)

Finding out the corresponding C code

Use the command objdump -SD executable | less to figure out what the C code is at a given address. Passing the compiler flag -ggdb enables objdump to print the C source code next to the disaseembled code. It may not always work with higher optimization level.

Printing the function name

The debugging process can be much faster if you can see the function names in a stack trace right away when the program crashed, instead of running objdump every time manually.

The proper way to do it is to read the debugging information from the .debug section of the memory. I did not have time to do that, so instead I built my own symbol table array using a shell script hooked up to the makefile.

This symbol table does not need to be sophisticated. A simple array of a function address and its name is good enough. This is because the performance is not a concern when you are printing the stack trace of a crashed system. Another reason is that we want this code to work all the time. It is pretty hard to mess up a linear search.

The symbol array is built off the exported symbols. The method I have used is simple. After compiling all the source code into assembly files, I run a shell script to search for the string “.global” in all the assembly files to generate the exported symbol table. Then I compile the generated code of exported symbols as well, and then link it all together at the end. The following is a sample code how to do it:

funcmap.h (funcmap provides the interface to find function names given an address)

typedef struct _tag_funcinfo {
   unsigned int fn;
   char *name;
} funcinfo;

/* call this before calling find_function_name */
void __init_funclist();
funcinfo *__getfunclist();
/* call this function to find the name of the function */
static inline char* find_function_name(uint pc) {
 funcinfo* fl = __getfunclist();
 int i = 0;

 while (fl[i].fn != 0) {
    if (fl[i].fn == pc) return fl[i].name;
    i++;
 }

 return "[unknown function]";
}
funcmap.c (generated by a shell script)

#include <./task.h> // include ALL the header files

static funcinfo __funclist[1]; // the length of this array is also generated

void __init_funclist() {
   int i = 0;
   __funclist[i].fn=(unsigned int)some_func;
   __funclist[i++].name="some_func";
   // .. more
   __funclist[i].fn=0; // null terminated

}
funcinfo* __getfunclist() { return __funclist; }

Lastly, this is how I read all the function names from assembly files in the shell script (the actual script):

FUNCTION_COUNT=`find . -name '*.S' -o -name '*.s' | xargs grep .global | awk '{print $3}' | grep -v '^$' | grep -v '^__' | sort | uniq | wc -l`
FUNCTIONS=`find . -name '*.S' -o -name '*.s' | xargs grep .global | awk '{print $3}' | egrep -v '(^$|^__|PLT|GOT|,)' | sort | uniq`

Putting it all together (Example)

Combining the stack trace with task information can be even more powerful than what basic C debuggers offer.

The following is an example of a stack trace output for multiple tasks. It prints two lines per task.

Task 0 {noname} (p:31, pc:0x2180b8, sp0x1edfe34, lr:0x2506d8, WAITING4SEND):
nameserver @ 0x2505e8+0,
Task 1 {noname} (p:0, pc:0x24c55c, sp0x1eafff0, lr:0x21809c, READY):
kernel_idleserver @ 0x24c550+0,
Task 3 TIMESERVER (p:31, pc:0x2180b8, sp0x1e4ff80, lr:0x21d1cc, WAITING4SEND):
timeserver @ 0x21d074+0,
Task 4 {noname} (p:31, pc:0x2180d0, sp0x1e1ffe0, lr:0x21f818, WAITING4EVENT):
eventnotifier @ 0x21f7c4+0,
Task 5 IOSERVER_COM1 (p:31, pc:0x2180b8, sp0x1deff04, lr:0x21eab8, WAITING4SEND):
ioserver @ 0x21e82c+0,
Task 6 {noname} (p:30, pc:0x2180d0, sp0x1dbffe0, lr:0x21f818, WAITING4EVENT):
eventnotifier @ 0x21f7c4+0,
Task 7 IOSERVER_COM2 (p:31, pc:0x2180e0, sp0x1d8fe6c, lr:0x21e104, RUNNING):
[unknown function] @ 0x21df94+0, ioserver @ 0x21e82c+253,
Task 8 {noname} (p:30, pc:0x2180b0, sp0x1d5ffe0, lr:0x21f830, WAITING4REPLY, last_receiver: 7):
eventnotifier @ 0x21f7c4+0,
Task 9 {noname} (p:2, pc:0x2180b0, sp0x1d2f878, lr:0x22006c, WAITING4RECEIVE):
uiserver_move @ 0x220018+0, timedisplay_update @ 0x23bed4+49, dumbbus_dispatch @ 0x21a5a8+15, a0 @ 0x234c88+646,

Task status code:

  • WAITING4SEND means the task is waiting for another task to send a message.
  • WAITING4RECEIVE means the task has sent a message but the receiver has not received the message yet.
  • WAITING4REPLY means the task has sent a message and someone received it but has not replied yet.
  • last_receiver tells us the last task that received the message from this task.
  • WAITING4EVENT means the task is waiting for a kernel event (e.g. IO).
  • READY means the task is ready to run next as soon as this task becomes the top priority task.
  • RUNNING means the task is currently running.

The first line displays the task number, name, priority, registers, task status and the task synchronization information. The second line displays the stack trace with the offsets from the address of the function.

Why is this powerful? We can use this to solve really complex synchronization issues with wait chains & priorities that is otherwise nearly impossible to without this information. At the end, we had more than 40 tasks interacting with each other and my life would have been much harder without this information.

Limitations

The major limitation of this method is that it can’t print the names of static functions. This is because the symbols for static functions are not exported globally. This is not a huge problem because you can still see the names from the output of objdump.

Comparative Technical Review of Windows Phone 7.5

I have got a used HTC Radar 4G (“Radar”) from eBay this Monday for $250 (+ shipping). This is my first Windows Phone purchase. Before this, I have used SonyEricsson Xperia X10 (2.1), Samsung Captivate (2.3), Google Nexus S (4.0), and LG G2x (2.3) over the last 15 months. I have bought Radar because I really got sick of G2x and Android problems and I wanted a breath of fresh air. I have spent months trying to hack my phone for better but it was overall a really frustrating experience (Originally, I did not get an iPhone because I have an excellent (real) unlimited plan on Wind Mobile and I did not want to switch carrier. Apple probably got this right a long time ago).

I mainly decided to write this review this because I am overall very impressed with the platform and I want more people to know about this. Also, I was also absolutely fed up with Android’s problems over 4 phones. I will try my best to be informative in this review, choose words carefully, point out important pros and cons of this phone and platform. I have some programming knowledge and somewhat minimal knowledge in embedded programming. A large part of it will be comparative to Android because that’s what I know the best (as a former “power” Android user). Also I have never used pre-Windows Phone 7.5 so I can’t comment on that.

Here’s how the phone looks like (sorry the camera on G2x is an absolute garbage):

IMG_20120329_214641IMG_20120329_214625

The Good Parts

Battery Life

Battery Life is probably the number one thing I am impressed with Radar. It shows a superb real-life battery life compared to any Android phones I have used. It will last all day with push email and messaging enabled, and listening to music for hours. There are a few things where I think Windows Phone is doing radically better than Android.

Playing Music in the background does not kill your battery

I like the fact that use Radar as a music player all day without draining all the battery. This was not possible with any other phones I have used previously. Radar typically lasts the entire workday or more with moderate usage, push emails, and push messages.

What’s wrong with Android Power Management?

In my opinion, one of the biggest mess in Android right now is the existence of the Wakelock. The wakelock has caused quite a scene in Linux kernel development before. The wakelock is a simple power management mechanism where obtaining the lock would prevent the phone from going to lower power state as long as the lock is held. In my opinion, the wakelock is rightfully one of the worst ideas in Android now that gives Android a lot of bad names with regards to battery life. There are a couple reasons why this is a bad idea.

First, locks (whether it is a mutex for concurrent programming or a means of resource  management) are a bad abstraction to use in general. It surely is simple; but there is no good recovery mechanism built-in in case somebody fails to release the lock properly unlike other primitives. In the context of power management, when a program fails to release the wakelock by mistake, it means that the program will silently consume all of your battery in less than 8~10 hours. For normal users, it is incredibly hard for them to find out where the battery problem even comes from. But even if you find it, what if the problem lies in one of Android’s core app?Dialer? You are really screwed up. The fact that this happens in a Android core app shows the difficulty of getting this right. That any other developer can cause such problem is even scarier.

Second, the current Android wakelock programming pattern makes it very hard to make an energy efficient program, including energy-efficient music players. Before explaining this, I want to point out an important fact: when you see a smooth continuous animation or hear music, the processor needs not to run continuously. Here’s why:

Generally, our visual system perceives a series of images as a smooth continuous motion if the images are replaced more than 60 times (often, we use the term frames per second to denote this frequency). This means, we only need to draw the cellphone screen every 0.016 seconds (16ms) to show a smooth animation. This might sound like a short time but this is a long time for computers. If you have a 1Ghz processor, you can run 16 million CPU cycles in that time and that’s relatively a lot. Now, suppose scrolling your web browser window takes 10% of the CPU power. In this case, your CPU works for 1.6ms and does nothing for the next 14.4ms every 16ms (This is obviously a super-simplified version of what happens but my point is still valid). With most modern ARM processors, we can put the CPU to deep sleep for that idle period. Similar principle applies to music playback.

On Android, all music players keep the CPU alive 100% of the time while the music is played. That’s including when your phone is in your pocket with the screen off. I am not making this up! Initially, I discovered this by inspecting the CPU sleep time from Spare Parts app. And, I confirmed my suspicion by reading the source code of MediaPlayer. That’s more than 90% CPU time and battery wasted for nothing.

Now, how do I know that WP does not use an equally bad mechanism? There are a few reasons. First, it’s just impossible for the CPU to be running for 24 hours on one battery charge. Second, you can never run a continuous long running task yourself on WP (see the section Strict Background Management). Lastly, if Microsoft used a similar “wakelock” mechanism internally to play music, it’s very unlikely that Microsoft managed to not screw up (sorry Microsoft).

Battery Saving Mode

batterysaverI like that WP has a battery saving mode by default. This is different from disabling auto-sync on Android for a few reasons. First, WP battery saving mode will prevent any background tasks from running when you want to save battery. I can be confident on how long the phone will be alive. There is nothing that enforces a poorly programmed app from running in the background when auto-sync is disabled in Android. Second, it has sensible defaults. It will automatically turn on when the battery is less than 20%.

Perceived Smoothness

WP feels very smooth. Try scrolling down fast on a WP. You may see partially rendered item but you won’t see any of the “jerks.” Also if you are trained to see the difference, WP runs the animations at a much higher frame rate than Android. If anything, it tells the non-programmers that software matters a lot more than the hardware (since my G2x had a dual-core and still was jerky). There’s nothing much more technical I can say about this. Those jerky movements in drove me crazy on Android. It happened when I type, start scrolling in the browser, or sometimes just randomly.

It also kind of changed my perception of managed languages running on mobile environment considering how WP7 apps are mostly Silverlight apps.

Strict Background Task Management

WP’s Background Task Model

This connects strongly back the battery life and to some degree, to the perceived smoothness. WP dictates what can run when I am not using the phone and I can be assured that a rogue app won’t drain all of my battery using GPS in my pocket warming up my leg. If you want to know the full detail, you can check the document explaining WP Background Agents.

On WP, there are two kinds of background tasks: PeriodicTask and ResourceIntensiveTask.

Among a lot of rules, my favourite part is that WP schedules the tasks together so that the phone does not wake up more often than it’s needed. For example, suppose you have two tasks that run every 30 min. Surprisingly, this can result in the phone waking up to 4 times in an hour. This happens when the first task wakes up at (0, 30) minute and the other task wakes up at (15, 45) minute. With n apps synchronizing every 30 minutes, this can blow up to 2n wake ups. Presumably, this has a pretty bad impact on battery life. WP will push them altogether. On Android, as far as I know, there’s no obvious way in the API to achieve this (from what I have studied). You can tack on to Google Apps synchronization but that’s kind of hacky. Also, these tasks cannot run more than 25 seconds, which is a good way to police battery draining applications. This is why I can safely add many Live Tiles on my home screen without worrying out accidentally destroying my battery.

ResourceIntensiveTasks can run up to 10 minutes. There are meant to synchronize a big chunk of data or do other long-running tasks. They have even more strict activation rule: power plugged-in, minimum 90% battery charge, user not using the phone and more. This ensures that you won’t ever experience stuttering from the massive background synchronization.

Don't use your phone while the Market is updating...Other than that, WP does not allow any non-system tasks to run at the same time other than one foreground tasks. An important difference from the Android way is that when you switch task, WP will suspend the entire foreground process and won’t grant any CPU cycles until it’s tombstoned or running again (refer to WP application lifecycle overview).

This goes back to the smoothness part, here’s a really good example. Go to Market (er, Play) and try updating the apps. I have never seen an Android phone that will not make your UI significantly stutter/hang (Well, I don’t know about those quad-core phones but certainly my dual-core phone stutters big time). Radar does not do that.

Of course, one might find this model extremely restrictive. But as a user, I like this model a lot more predictable and reliable with respect to responsiveness and battery life (and It Works for Me!™).

Music Player

music-nowplayingZune is a really nice music player. It is beautiful, simple to use and still fast. It provides a good (music, ratings and others) two-way synchronization with your Windows PC through Zune software. Zune Pass is a pretty good deal, too. Unlike Google Music, I can see the current playlist (well, I am sure they will add it back, right Google?). Integration with other media player is something really nice to see, as well.

I really like the automatic synchronization scheme I am using now: I have a playlist of songs that I always want my phone, a list of 100 shuffled favourite songs and a list of 50 shuffled unrated songs. Then I just synchronize those three lists with the phone. This way, whenever, I connect the phone, I get shuffled new music. 4GB is a plenty space to do something like this and enjoy different music every day.

Also, the wireless synchronization over WLAN is really cool, albeit terribly implemented. This is much better than using up my expensive and slow (that’s right, I am in Canada) Internet connection. The downside is that there is no way to start the wireless synchronization forcibly and when things go wrong, you have to try to debug uPNP issues. It’s not fun.

Camera

I like that the phone comes with a camera button. It’s a two-stage shutter button so I can focus and shoot easily. X10 had this, and quite frankly I missed it a lot.

Also, it’s really easy to activate the camera even compared to ICS. Just holding the camera button will do. On ICS devices, you have to turn on the screen first to touch the lockscreen.

Lockscreen

lockscreenThe lockscreen is very informative. It’s just one of the sensible defaults I like (You can choose to disable it if you want to).

Integrated Messaging

You can talk to your friend through SMS, Facebook, or MSN using the built-in messenger app. I think it’s cool but I think Palm fans like this a lot more than I do. Unfortunately, gTalk does not work with this. Hopefully Microsoft adds a generic XMPP support in the next version.

messagingmessaging-select

Messaging app screenshots

People Hub

peoplehubSonyEricsson tried something similar with Timescape before… except theirs was terribly executed. It was sluggish, took more than a second to peek through each item in the entry. Nice try though, Sony.

People Hub shows the updates from Twitter, Facebook, LinkedIn and other social networks in one place.

You can also aggregate the information by person (Sony tried this too, but it was terribly slow to use). Clearly, this is an ultimate stalking tool.

You can sync the contacts with Hotmail, Exchange, Google (yes, it fetches contact pictures), Facebook, Twitter and others. Also, WP lets me choose one of many profile pictures of a contact. This really bugged hell out of me on Android when I suddenly started seeing the Twitter eggs instead of people’s face on my phone.

Application Bar Menu

I prefer WP’s Application Bar over ICS’ Action Bar. They both carry out a similar role. But I like Application Bar better because it tells me what the icons mean when I expand it (It’s funny because Application Bar actually predates Action Bar).

Application bar, and its expansion:

appbarappbar-expanded

Find My Phone

findmyphoneI know Android has an application called exactly this, but this counts as a plus one in the sensible default category to me.

HTC’s Attentive Phone Feature

attentive phoneI usually don’t like the additions that OEMs make but this one is an exception. It’s a nice attention to detail. (On the other hand, HTC Sense App looks absolutely out of place… I just uninstalled it. Yes, you can uninstall them).

Office (Word, Powerpoint, Excel and Onenote)

onenoteWP comes with it (unsurprisingly). This is about 100 times better than trying to use Google Docs on the phone.

Apps

Apps on the market are really polished and follows the platform guideline much better.

I really like the app trial system of the Marketplace. Not only i get to try the apps (doh), there’s a lot less ugly lite/premium version cluttering in the market. I heard this is a pretty new feature.

Stability

This is probably something I hate the most about LG G2x and the fact that they are getting away with the problem with no publicity.

G2x, every day, shuts off randomly and will never turn on again until you pull out the batteries, going into what’s referred to as “sleep of death.” This is incredibly problematic because I will miss the alarm, calls, emails, and other stuff. It’s just plain infuriating. No custom ROMs really fixed it. What I have heard from my friends who have worked with Tegra 2 is that it’s really easy to get Tegra 2 into a sleep state where no interrupt will wake up the phone. I will totally buy that. The phone also reboots randomly. I will never buy any LG phone again in my life.

The GPS on G2x almost never worked, either. It only locks after like 10 minutes. I have tried all the voodoo magic suggested by xda hackers but they didn’t work for me. These problems are well-known and supposedly resulted in a lawsuit but I have never heard of any progress on this.

Radar is extremely stable, and it never crashed on me. This is good. If someone had a heart attack next to me, the last thing I want is my phone crashing or freezing (this isn’t that uncommon).

Keyboard

Since the WP does not have a replacement keyboard, I expected very high of the keyboard and it delivered. Two things: First, the keyboard just works. It is much better than SwiftKey in a way that pressing shift key doesn’t always correct what you type. It takes your typing speed and other factors into account. Correcting a misspelled word is very easy, too.

Second, the default keyboard provides first-class supports to non-Latin languages by default. For example, it was easy  and fast to switch between English and Korean. Both languages are supported fully in terms of the prediction quality.

My experience with using two keyboards in Android wasn’t that bad but it wasn’t as good. First, the good English keyboards either lacked Korean entirely or had no prediction ability. Similarly, all the Korean input methods had pretty much abysmal English prediction ability. Second, while ICS made input method changing easier, all of the input methods took a few seconds to initialize and popup. I entirely gave up mixing two languages in a text because switching input methods made such a big interruption in my texting process.

Emulator

Anyone who has tried running Android emulator knows how much it sucks. WP7 emulator is better. But frankly, I think anything is better than the Android emulator.

Development Language

It’s not Java. I get to program in C#, F# (yea I am a f# fanboy) and make cheesy animations with Silverlight.

Looks nice and informative

Lastly, I find WP quite clean, nice and informative. Especially, it was new to see something that’s not just a grid of things on the home (“start”) screen. WP theme has a concept of an “accent” colour which is a nice touch (but you can only choose between 11 of them, which sucks a bit, but even CM only gave me 3 choices).

HTC Theme and Mango Theme colours:

start-htcstart-mango

Some of the other live tiles (Baconit and Carbon) and pinned bookmarks (Hackernews and Google Reader):

bookmark

The Bad Parts

Opaque Storage Management

I wasn’t very sure where to put this but I decided to make it a con. That is because I was quite sick of Android apps littering files all over my sdcard and I was hoping maybe a complete storage management by the OS is a good thing. But WP 7.5 fails to do it nicely. For instance, there is exactly zero app that tells me how much space is used by each app. This makes it quite hard to find out which apps are using my space when I need it. Games usually a lot of space but… I don’t want to have to use the trial-and-error method to find it out.

The lack of UMS option might bother a number of people as well. While I have eventually moved to the “managed” synchronizing music players, a lot of people still prefer organizing their music and pictures directly on their own.

reservedspace

Lack of GApps support

It’s probably Google’s business decision but it sucks to have near-zero Google apps support. Specifically irreplaceable is Google Maps. Don’t get me wrong, Bing Maps is nice. But the transit support is almost nonexistent in Canada. I can get away with things like NextBus.com in big cities like Toronto but in small cities like Waterloo, it’s nice to be able to schedule ahead.

There are some heroic efforts by apps like gMaps (nice try) but this falls short of the native Google Maps.gmaps-resultgmaps2gmaps-3

Gmail synchronization is good, but it’s missing Gmail specific features so if you make use of them, you won’t be very happy (The web version still works).

Lack of Some Popular Apps

There are a lot of popular apps on WP that functions perfectly fine (this may be because I started using WP years after the initial launch) but that’s not good enough. I am missing apps like Mint, my banking app. They are not a deal breaker since I can use the mobile web version but the difference between the native apps and web apps on WP is huge. That’s not because the browser sucks but because the native experience of WP is delivering more than other platforms.

Fortunately, it seems like Microsoft is acknowledging this according to this leak. According to the leak, they are planning to catch up the TOP25 apps in other platforms by the end of H2 FY12. I think they started doing a good job of doing this, by running hackathons and investing a lot of money. I know the app numbers aren’t everything and the trends look okay but it needs to be better.

Notification Area

Android notification area is definitely better than the WP toast notifications. WP toasts don’t stack. You can only click them to go to the app, or dismiss by swiping. Once it’s dismissed, you can’t find it. It’s not that the end of the world since you can see the Live Tiles on the start screen but coming from Android, this is a nuisance.

Sideloading Restriction

This is kind of annoying as I still have the habit of running random programs that are not from the Market. If you are a student, you get to sideload 3 apps. If you are a registered developer ($99/year) or buy one of those unlock tokens ($10), then you get to load 10 apps. This is not really an issue if you just develop apps on your phone, but if you run homebrew apps (like screenshot apps) then this 3 apps limit might be a little bit limiting.

Can’t replace Bing as a search provider

Bing is not bad but I want my choice. The only reason I can think of them not allowing this is because the native search app can’t organize the Google search result the way they want.

This is how Bing presents the search result:

Screen Capture (27)bing resultbing local

Hardware Search button not flexible

No matter where you are, the search button will take you to the Bing search screen. To search in the current context, you have to use the magnifier icon in the application bar.

Some Task Contracts makes no sense to me

Maybe this is one of my Android bias but in my opinion, launching dormant (suspended) apps from the start screen counts as a new application launch. And Microsoft says your application should lose the transient data. It looks inconsistent to me that an app preserves the state when you approach it from the task switcher but not from the start screen. Funnily, Microsoft’s own Messaging app violates this. If you start the Messaging app from the start screen when a new message arrived, the messaging app will take me to that the thread the new message belongs to, not the home screen of the messaging. Execution Model Overview document provides a very good summary of how these things work. I think Microsoft should make a change to make things consistent.

8GB of space is too small

I used my G2x with a 32GB microsd attached. On the other hand, Radar comes with meagre 6GB of actual usable space when formatted, and about 5GB after app installations and synchronization. Considering how this is a relatively low-end phone, I can’t complain too much, but it could have been better.

Conclusion

Despite some of the shortcomings, I like WP7.5 because of the sensible defaults that don’t make me stressed. It’s as if the people who made the phone tried using the phone! (oh, my). The user interface is delightful to use. I don’t think I will go back to Android for a while.

I feel like some of the open options Android provide to developers are a bit misguided. Microsoft made this exact same mistake more than ten years ago. The result was that Windows is now forever mocked for its BSOD (even if it originated from non-Microsoft drivers) and sluggishness regardless of how the platform is doing now.

WP7.5 deserves way more attention than it does. Hopefully, Microsoft succeeds to convince people this year with new cool phones like Lumia 900.

Transferring installed programs on OpenSUSE 12.1 (or, trying to enlarge the root partition)

I am using OpenSUSE 12.1 in VirtualBox. After installing a bunch of programs I needed, it started complaining that there is not enough space in root partition. After a while, it got to the point where I couldn’t even install security updates. I have installed OpenSUSE on the default hard disk size of 8GB, with the default root partition size of 5GB but clearly that was not enough.

Unfortunately the root partition was laid out between the boot partition and the home partition. So I decided to try to take a shortcut by mounting a new harddisk, use dd to copy the content of the partitions, and use resize2fs to enlarge the new root partition on the new hard drive. HUGE mistake! I completely forgot modern Linux installations use fstab with UUID entries. The bottom line is that I couldn’t get OpenSUSE to boot up at all.

Hence, I took a longer way of re-intalling and transferring my configurations. I haved used the following command to extract the names of all the packages installed on my OpenSUSE installation.

zypper se -i | tail +6 | awk '{ print $3 }' > packages.txt

Installing a new OpenSUSE instance took about 10 minutes on my desktop (i5-2500k). I just copied over the file to the new instance and issued the following command to restore all the installed packages.

zypper install `cat packages.txt`

Another note on using OpenSUSE on VirtualBox is that you should uninstall the built-in VirtualBox guest addition and re-install a newer version of VirtualBox guest edition.

zypper remove virtualbox-guest-x11 virtualbox-guest-tools virtualbox-guest-kmp-default

Airplane

Debugging ARM without a Debugger 2: Abort Handlers

This is my second post in the series Debugging ARM without a Debugger.

This is an excerpt from my debugging techniques document for Real-time Programming. These techniques are written in the context of writing a QNX-like real-time microkernel and a model train controller on a ARMv4 (ARM920T, Technologic TS-7200). The source code is located here. My teammate (Pavel Bakhilau) and I are the authors of the code.


It is useful to have a simple abort handler early on before working on anything complex, like context switch. The default abort handlers that come with the bootloader spew out minimal information for gdb if lucky, or often they just hang with no message (In fact, I am now very grateful that I am able to kernel panic messages at all when things are gravely wrong with my computer). By installing an abort handler, you will be able to what went wrong in case the asserts were not good enough to catch problems earlier.

Installation

There are three interrupt vectors that need to be intercepted: undefined instruction (0×4), prefetch abort (0xc) and data abort (0×10). We can re-use one abort handler because the abort type can be read from the cpsr. One exception is that both instruction fetch abort and data fetch abort share the same processor mode. We can work around this by passing a flag to the C abort handler. The following is a sample code:

// c prototype of the abort handler
void handle_abort(int fp, int dataabort); 

// the abort handler in assembly that calls the C handler
.global asm_handle_dabort
asm_handle_dabort:
	mov r1, #1
	b abort

.global asm_handle_abort
asm_handle_abort:
	mov r1, #0
	abort:
	ldr sp, =0x2000000
	mov r0, fp
	bl handle_abort
	dead:
	b dead

Because ARM has a separate set of banked registers for abort modes, the stack pointer is uninitialized. Since I wanted to use a C handler to print out messages, I need to set up a stack. In this code, I manually set the stack pointer to be the end of the physical memory (our board had 32MB RAM in total so 0×2000000 is the end of the memory). For convenience, I also pass the current frame pointer in case I want to examine the stack of the abort-causing code.

When dealing with register values directly in C, it is convenient to have the following macro to read register values:

#define READ_REGISTER(var) \
__asm volatile("mov %[" #var "], " #var "\n\t" : [var] "=r" (var))
// usage: int lr; READ_REGISTER(lr);
#define READ_CPSR(var) \
__asm volatile("mrs %[mode], cpsr" "\n\t" "and %[mode], %[mode], #0x1f" "\n\t" \
: [mode] "=r" (var))
// usage: int cpsr; READ_CPSR(cpsr);

In the C abort handler, by reading the cpsr, you should be able to figure out the current mode. Refer to ARM Reference Manual section A2.2.

The following a brief summary of the abort environment and their interpretation. The precise information can be found in the reference manual chapter A2. You should read the manual to understand the process better.

An important thing to remember is that you should do your best to ensure that your abort handler does not cause another abort inside. Again, be very conservative when dereferencing pointers.

Interpretation

Read all the values from the registers first, and then print. Otherwise, there is a chance some registers might get overwritten.

cpsr

dabort refers to the second parameter passed into the C abort handler.

The lower 5 bits of cpsr

Interpretation

0×13

You are in svc mode. It probably means your abort handler caused another abort inside. Fix it.

0×17 (dataabort = 0)

Instruction fetch abort

0×17 (dataabort = 1)

Data fetch abort

0x1B

Undefined instruction

lr

Link Register normally contains the address to one instruction after the instruction that called the current function.

Current mode

Interpretation

Data fetch abort

The abort was caused by the instruction at lr – 8

Instruction fetch abort

The abort was caused by the instruction at lr – 4

Undefined instruction

The abort was caused by the instruction at lr

Fault type (in case of data/instr. fetch abort)

Read the fault type using the following code:

volatile unsigned int faulttype;
__asm volatile ("mrc p15, 0, %[ft], c5, c0, 0\n\t" : [ft] "=r" (faulttype));
faulttype &= 0xf;

Fault type value

Interpretation

(faulttype >> 0×2) == 0

misaligned memory access

0×5

translation

0×8

external abort on noncacheable

0×9

domain

0xD

permission

To see a big picture of how the fault checking works (other than misaligned memory access), you are advised to read the section 3.7 of ARM920T Technical Reference Manual. In short, unless you are making use of memory protection, you will never get domain and permission faults.

Data fault address (only applicable to a data abort)

This is the address the code tried to access, which caused the data fetch abort. Read it using the following code:

volatile unsigned int datafaultaddr;
__asm volatile ("mrc p15, 0, %[dfa], c6, c0, 0\n\t" : [dfa] "=r" (datafaultaddr));

Our actual abort handling code is located here.

Summary

It is very convenient to have a bullet-proof abort handler. It really gives you a lot more information about the problem than a hang. As well, don’t forget that most DRAM content is not erased after a hard reset, so you can use RedBoot’s dump (x) command to examine the memory, if really needed. With some effort, one can also set up the MMU to implement a very simple write-protection of the code region. Such protection could be useful to prevent the most insidious kind of bugs from occurring (Luckily, we did not have to deal with such bugs).  Airplane

Debugging ARM without a Debugger 1: Use of Asserts

This is my first post in the series Debugging ARM without a Debugger.

This is an excerpt from my debugging techniques document for Real-time Programming. These techniques are written in the context of writing a QNX-like real-time microkernel and a model train controller on a ARMv4 (ARM920T, Technologic TS-7200). The source code is located here. My teammate (Pavel Bakhilau) and I are the authors of the code.


Failing fast is an extremely useful property when programming in C. For example, problems with pointers are much easier to debug if you know exactly when an invalid pointer value is passed into a function. Here are few tips for asserting effectively:

There is no such thing as putting too much asserts.

CPU power used for asserts will almost never cause a critical performance issue [in this course]. You can disable them when you know your code is perfect. Verify pointers every pointer dereference.

Assert pointers more aggressively.

Do not just check for NULLs. We know more about the pointer addresses. We know that the pointer address is limited by the size of the memory. As well, from the linker script, we can even deduce more information. For example, we know that normally, we would not want to dereference anything below the address 0×218000 because that is where the kernel is loaded. Similarly, we can figure out what memory region is text and data.

Remove all uncertainties.

Turn off interrupts as soon as possible in the assert macro. When things go wrong, you want to stop the program execution (and the trains) right away. If you do not turn off interrupts, a context switch might occur to other task and you might not be able to come back ever to stop and display what went wrong.

Print as much information as possible.

Make an assert macro that resembles printf and print as much contextual information as possible. When you have no debugger, rebooting and reproducing can be really time-consuming. 1.5 months is a very short time to build an operating system from scratch so use it wisely.

e.g. ASSERT(condition, “oops! var1:%d, var2:%x, var3:%s”, var1, var2, var3);

Example

Here’s a short snippet of ASSERT macro. It has evolved over 3 months and it looks really dirty but it works. (source)

typedef uint volatile * volatile vmemptr;

#define VMEM(x) (*(vmemptr)(x))
void bwprintf(int channel, char *fmt, ...);
#define READ_REGISTER(var) __asm volatile("mov %[" TOSTRING(var) "], " TOSTRING(var) "\n\t" : [var] "=r" (var))
#define READ_CPSR(var) __asm("mrs %[mode], cpsr" "\n\t" "and %[mode], %[mode], #0x1f" "\n\t" : [mode] "=r" (var))
void print_stack_trace(uint fp, int clearscreen);
void td_print_crash_dump();
int MyTid();

#if ASSERT_ENABLED
#define ASSERT(X, ...) { \
        if (!(X)) { \
                VMEM(VIC1 + INTENCLR_OFFSET) = ~0; /* turn off the vectored interrupt controllers */ \
                VMEM(VIC2 + INTENCLR_OFFSET) = ~0; \
                int cpsr; READ_CPSR(cpsr); \
                int inusermode = ((cpsr & 0x1f) == 0x10); int tid = inusermode ? MyTid() : -1; \
                bwprintf(0, "%c", 0x61); /* emergency shutdown of the train */ \
                int fp, lr, pc; READ_REGISTER(fp); READ_REGISTER(lr); READ_REGISTER(pc); \
                bwprintf(1, "\x1B[1;1H" "\x1B[1K"); \
                bwprintf(1, "assertion failed in file " __FILE__ " line:" TOSTRING(__LINE__) " lr: %x pc: %x, tid: %d" CRLF, lr, pc, tid); \
                bwprintf(1, "[%s] ", __func__); \
                bwprintf(1, __VA_ARGS__); \
                bwprintf(1, "\n"); /* if in usermode ask kernel for crashdump, otherwise print it directly */ \
                if (inusermode) { __asm("swi 12\n\t");} else { td_print_crash_dump(); } \
                bwprintf(1, "\x1B[1K"); \
                print_stack_trace(fp, 0); \
                die(); \
        } \
}
#else
#define ASSERT(X, ...)
#endif

That’s it for today. Airplane

Working wpa_supplicant.conf configuration for the network uw-secure at UWaterloo for Xperia X10 (1.6)

While Sony Ericsson has promised us that they will update X10 with a moderately recent version (2.1) of the Android Operating System by Q4 2010, those of us who are stuck with Android 1.6 cannot normally connect to the most wireless networks using WPA-EAP including uw-wireless at the University of Waterloo. Apparently, the reason is while Android 1.6 does support WPA-EAP, there is no user interface (!) for editing these network configurations.

Fortunately, X10 (including X10a sold in Canada by Rogers) has been rooted very recently by the people at xda-developers.com. You can follow the guidelines here (For X10a users, it is important to install stuff in the post #5 as well).

After obtaining the root of the phone, you can edit the file wpa_supplicant.conf in /data/misc/wifi directory. I made a copy before making changes just in case. It is important that the owner and the permission of the file remains the exact same (owner: system, group: wifi and permission: 660).

Using your favourite method, append the following to the file:

network={
        ssid="uw-secure"
        scan_ssid=1
        proto=WPA
        key_mgmt=WPA-EAP
        eap=PEAP
        identity="UWDirID"
        password="UWDirPASSWORD"
        phase1="peaplabel=0"
        phase2="auth=MSCHAPV2"
}

I’ve assembled the configuration from this post at Arch Linux Forum by vogt. Two modifications I made is that I removed the line specifying ca_cert and added the line proto=WPA. For whatever reason, the phone will ignore the configuration if there is no proto=WPA line.