How to parse a certain column from a CSV string using Regular Expressions

I need to parse the string below using regular expressions. Yes, it must be a regular expression. 🙂

"123","06/16/2008","","123456","1","1234","This is a title string","4.99","USD","","","","kozen@kozen.de","HK","Hong Kong","210000D1","Individual String","Site string","Moep","","Not required","","","","","","kozen","the bozen","da kozl street 23","","kozmode","12345","23232323232323"

I need a regular expression matching the content of a specific field number. E.g.

  • 13 » kozen@kozen.de
  • 27 » kozen
  • 28 » the bozen
  • 29 » da kozl street 23

That’s how far I got:

"([^"]*)",

which gives me the content of a field, and results in 123 for the first field. This is just the first one, I need number 13.

Another expression is:

("([^"]*)",){13}

which matches 13 times and the last matching results in "kozen@kozen.de",. The quotation marks and the comma should not be here 🙁

Actually, I thought the following should match the expression 13 times, but for some weird reason it does not work:

"([^"]*)",\13

If anyone has an idea I would appreciate your thoughts. I am messing around with that stuff for 6 hours now, google-ed the world out of the net but didn’t find a solution. Something ‘universal’ like the one above would be nice so I can just replace the number (’13’) with another column number to grab another column’s content.

Here are some helpers by the way:

Please follow and like us:

3 Replies to “How to parse a certain column from a CSV string using Regular Expressions”

  1. Thanks a lot to x who solved this issue in a second!

    Here is the example for getting field number 13 without the quotation marks:

    (?:\"[^"]*\",){12}\"([^"]*)\",.*

    I forgot to mention that the CSV line is part of a bigger file which means there can be lines before and after this CSV line that do not contain any relevant data 🙂

  2. Ganz spontan und simpel hätte es auch einfach ^"[^"]*","[^"]*","[^"]*","[^"]*","[^"]*","[^"]*","[^"]*","[^"]*","[^"]*","[^"]*","[^"]*","[^"]*","([^"]*)", getan, da warst Du ja schon dran, ist allerdings nicht so elegant. 🙂

    Beide Lösungen funktionieren allerdings nur, wenn "," nie Whitespace enthaelt und leere Felder immer ,"", sind und nicht ,,

  3. Leider ist das Feld, wo ich den regulären Ausdruck eingeben muß vielleicht 30 Zeichen lang oder so. Das wird dann schnell knapp, wenn ich Position 27 haben will 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *