Quarantine, better than Cloverfield.

I got a chance to watch Quarantine. Let me tell you, I felt tense even 2 hours after watching it. The movie is good as any horror movie, I can’t imagine being being in a situation like that.
I thought the same thing about Cloverfield, but I was disappointed when I got to watch it. Fortunately [...]

Popularity: 64% [?]

Flickr Photos

what’s wrong with this regex?

I’ve been working on some pretty boring e-mail stuff. I have to find email addresses in a database export. The email addresses are all mixed in there with the code and other parts of the dump. I’m sure that there’s a more elegant way of doing this since the information is in a database to begin with. But then I thought, what if it wasn’t. What if I just had a 100 meg file of text garbage –for lack of a better word– with e-mail addresses mixed in.

Text is great to do searches on, and what better to find stuff in text than Grep.

So I began searching for regular expressions I could use with grep to get all the e-mail addresses from a chunk of text. I think I found this one expression about a dozen times.

egrep -o “\w+([._-]\w)*@\w+([._-]\w)*\.\w{2,4}”

Anytime I saw a thread or a post with someone asking for a solution to the email finding regex, I found this. This seems to be the the end all. The thing is, it doesn’t work. At the very least, it doesn’t pass my test.

Here’s are email formats I know work with all the e-mail systems I could try.

one1@example.com
one@example.com
name.last@example.com
name_last@example.com
name.last.misc@example.com
any.last-combo@example.com
name.last@sales.example.com
a.n.y.t.h.i.n.g.@example.com

Eight e-mail addresses, all valid. Well when I run the regex above against this text, here’s what I get:

$ egrep -o “\w+([._-]\w)*@\w+([._-]\w)*\.\w{2,4}” clean.regex.sample.txt
one1@example.com
one@example.com
last@example.com
name_last@example.com
misc@example.com
combo@example.com
last@sales.exam

Notice the differences? First, it only found seven addresses, and then some of the ones that it did find, were mangled. So this regex does not work. I don’t understand why it is so popular.

So, what is the right regex for this task? I need to find all the e-mail addresses in the formats I listed, and try to anticipate any other formats that I missed.

After some trial and error and consulting the experts, I found this to work pretty well. Maybe my test is flawed but at the moment, this works because it catches all the examples I posted earlier.

egrep -i -o “[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}”

Here’s what I mean:

$ cat test.email.txt
one1@example.com
one@example.com
name.last@example.com
name_last@example.com
name.last.misc@example.com
any.last-combo@example.com
name.last@sales.example.com
a.n.y.t.h.i.n.g.@example.com

[04:40 AM]:[oscarg@oscargimac]
[/Users/oscarg]
$ egrep -i -o “[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}” test.email.txt > good.txt

[04:41 AM]:[oscarg@oscargimac]
[/Users/oscarg]
$ egrep -o “\w+([._-]\w)*@\w+([._-]\w)*\.\w{2,4}” test.email.txt > bad.txt

[04:41 AM]:[oscarg@oscargimac]
[/Users/oscarg]
$ diff good.txt bad.txt
3c3
< name.last@example.com

> last@example.com
5,8c5,7
< name.last.misc@example.com
< any.last-combo@example.com
< name.last@sales.example.com
< a.n.y.t.h.i.n.g.@example.com

> misc@example.com
> combo@example.com
> last@sales.exam

I hope someone else finds this helpful, it took me a while to find this and make it work. And since I’m no expert at regex, please feel free to let me know of a better way to do this.

Also see - http://www.regular-expressions.info/email.html

Popularity: 49% [?]

Solenoid concert

My buddy sent this in this morning and I had to post it. I know it’s been a while since I posted anything. Hopefully you like this video. It also seems that the guy is running Linux.

Thanks D.

Popularity: 53% [?]

Cool new Gmail feature - canned responses

Do you find yourself replying to the same messages repeatedly? The days of typing or copying and pasting similar responses by hand are over.

Gmail recently introduced “canned responses” to their labs. This new feature allows you to have pre-defined e-mail responses ready to go. Instead of spending time replying to all the inquires that you get on your business e-mail for example, you can select from predefined responses.

Check out some more information at gmail’s blog. http://gmailblog.blogspot.com/2008/10/new-in-labs-canned-responses.html

Popularity: 85% [?]

Don’t forget to Vote!

img_4939.jpg

The cutoff date for voter registrations is approaching fast. If you live in California, the latest time to get your registration form in is Oct 20th! that’s in less than 2 days if you think about mailing time.

If you haven’t done so yet, please register to vote. Don’t let any more time go by, and please get out there and vote.

This is how many days are left…

Get out and vote!

Popularity: 100% [?]