blog-2024-01-08-1brc-kotlin

Monday, January 8, 2024, 2:49:42 PM Coordinated Universal Time by stefs

i tried the 1brc challenge in kotlin. the naive implementation was simple enough, though not exactly like the java version - the groupingBy function works differently, in that it emits a Grouping instance which then provides aggregate, fold and reduce functions.

the use of a Sequence (akin to lazily evaluated java streams) is necessary as a list wouldn't fit in memory. interestingly, kotlin doesn't provide parallel sequence processing (unlike java's List.stream().parallel()).

chunking the input into sub-lists for parallel processing isn't possible, as this would lead to the whole file having to be loaded into RAM before chunking (which produces Lists, not Sequences). my next step, if i were to follow up on this, would be to divide the file into n regions and let threads build maps for the regions and finally merge the maps.

interestingly, with 100k lines (for testing), on my machine the Hotspot takes ~160ms, while graal's community native image completes in ~30ms (!) (excluding vm startup, but without JIT warmup).

even more interestingly, with 1b lines, it's the other way round: 200sec for GraalVM vs. 115sec for the Hotspot.

source


Monday, January 8, 2024, 11:56:18 AM Coordinated Universal Time by stefs

i tried the 1brc challenge in kotlin. the naive implementation was simple enough, though not exactly like the java version - the groupingBy function works differently, in that it emits a Grouping instance which then provides aggregate, fold and reduce functions.

the use of a Sequence (akin to lazily evaluated java streams) is necessary as a list wouldn't fit in memory. interestingly, kotlin doesn't provide parallel sequence processing (unlike java's List.stream().parallel()).

chunking the input into sub-lists for parallel processing isn't possible, as this would lead to the whole file having to be loaded into RAM before chunking (which produces Lists, not Sequences). my next step, if i were to follow up on this, would be to divide the file into n regions and let threads build maps for the regions and finally merge the maps.

interestingly, with 100k lines (for testing), on my machine the oracle jvm takes ~160ms, while graal's community native image completes in ~30ms (!).

source


Monday, January 8, 2024, 11:50:00 AM Coordinated Universal Time by stefs

i tried the 1brc challenge in kotlin. the naive implementation was simple enough, though not exactly like the java version - the groupingBy function works differently, in that it emits a Grouping instance which then provides aggregate, fold and reduce functions.

the use of a Sequence (akin to lazily evaluated java streams) is necessary as a list wouldn't fit in memory. interestingly, kotlin doesn't provide parallel sequence processing (unlike java's List.stream().parallel()).

chunking the input into sub-lists for parallel processing isn't possible, as this would lead to the whole file having to be loaded into RAM before chunking (which produces Lists, not Sequences). my next step, if i were to follow up on this, would be to divide the file into n regions and let threads build maps for the regions and finally merge the maps.

interestingly, with 100k lines (for testing), on my machine the oracle jvm takes ~160ms, while graal's native image completes in ~30ms (!).

source


Monday, January 8, 2024, 9:00:38 AM Coordinated Universal Time by stefs

i tried the 1brc challenge in kotlin. the naive implementation was simple enough, though not exactly like the java version - the groupingBy function works differently, in that it emits a Grouping instance which then provides aggregate, fold and reduce functions.

the use of a Sequence (akin to lazily evaluated java streams) is necessary as a list wouldn't fit in memory. interestingly, kotlin doesn't provide parallel sequence processing (unlike java's List.stream().parallel()).

chunking the input into sub-lists for parallel processing isn't possible, as this would lead to the whole file having to be loaded into RAM before chunking (which produces Lists, not Sequences). my next step, if i were to follow up on this, would be to divide the file into n regions and let threads build maps for the regions and finally merge the maps.

source


view