Discussion:
Splitting in shell (bash)
(too old to reply)
Kenny McCormack
2024-11-09 16:19:17 UTC
Permalink
There is a feature that is prominently missing from the shell language (I
am speaking primarily of bash here) - which is the ability to split a
string on a delimiter. This is a common operation in most other
text-processing oriented languages (AWK, Perl, etc).

First note/caveat: I'm not interested in any solution involving IFS, for
two reasons:
1) IFS-based solutions never work for me.
2) Changing IFS is inherently dangerous, because, well, IFS itself in
inherently dangerous. Yes, I know it has been somewhat de-fanged
recently - but it is still dangerous.

Anyway, the point of this thread is that I have recently developed a good
solution for this, using bash's "mapfile" command. Suppose we have a
string in a variable (foo) like: foo;bar;bletch

I.e., with ; as the delimiter.

This works well, with a couple of caveats:

mapfile -td ';' <<< "$foo"

Caveats:
1) You can only have one, single character delimiter. It'd be nice if
you could have a reg-exp, like in GAWK.
2) If the output you're processing comes from a process, as is usually
the case, special care must be taken:

mapfile -td ';' < <(someprocess | awk 1 ORS=)

The point is that since ';' is now the delimiter, the newline at the end of
the line coming from someprocess is not treated as a delimiter, so it must
be disposed of with the AWK script. It'd be nice if you could make both
';' and '\n' be recognized as delimiters.

But other than those two caveats, it works well.
--
The randomly chosen signature file that would have appeared here is more than 4
lines long. As such, it violates one or more Usenet RFCs. In order to remain
in compliance with said RFCs, the actual sig can be found at the following URL:
http://user.xmission.com/~gazelle/Sigs/Cancer
Lem Novantotto
2024-11-10 00:51:58 UTC
Permalink
Post by Kenny McCormack
First note/caveat: I'm not interested in any solution involving IFS, for
1) IFS-based solutions never work for me.
2) Changing IFS is inherently dangerous, because, well, IFS itself
inherently dangerous. Yes, I know it has been somewhat de-fanged
recently - but it is still dangerous.
Sorry to bother you, but... could you please give me some hints? Why
dangerous? Thanks. :-)
Post by Kenny McCormack
Anyway, the point of this thread is that I have recently developed a
good solution for this, using bash's "mapfile" command.
[...]
mapfile -td ';' <<< "$foo"
Yes, mapfile is a good solution. Of course, when you use <<<, no word
expansion is performed, so IFS is out of the question.
Post by Kenny McCormack
1) You can only have one, single character delimiter.
But you could have more using IFS. Something like:

| $ IFS=";:[ENTER]
| "
| $ declare myarray
| $ for i in $(echo "one:two;three 3[ENTER]
| four"); do myarray+=( $i ); done
| $ echo ${myarray[@]}
| one two three 3 four
| $ echo ${myarray[2]}
| three 3

What would be wrong with it?
--
Bye, Lem
Lawrence D'Oliveiro
2024-11-10 01:25:49 UTC
Permalink
Post by Lem Novantotto
But you could have more using IFS.
IFS is the way to go.

If you want your change to IFS to be only temporary, you can restrict it
to a subshell by putting the code sequence in “( ... )”. But then you
cannot pass variables back to the parent shell.

Another option is to use a coproc command.
Lem Novantotto
2024-11-10 13:28:32 UTC
Permalink
Post by Lawrence D'Oliveiro
IFS is the way to go.
I agree.

I've thought of it again, but besides the fact that IFS can handle only
one-char delimiters (which wasn't the matter in the OP's example)... I
actually cannot fancy any major issues with it (that furthermore would
need a fix). My bad, maybe.
--
Bye, Lem
Jerry Peters
2024-11-17 00:43:31 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Lem Novantotto
But you could have more using IFS.
IFS is the way to go.
If you want your change to IFS to be only temporary, you can restrict it
to a subshell by putting the code sequence in ???( ... )???. But then you
cannot pass variables back to the parent shell.
Another option is to use a coproc command.
Or put it in a function and declare IFS local.
~$ xxx () { local IFS="$IFS|" ; echo "$IFS" ; }
~$ xxx

|
~$ echo $IFS
Kenny McCormack
2024-11-10 05:55:31 UTC
Permalink
In article <vgo225$91aq$***@news.xmission.com>,
Kenny McCormack <***@shell.xmission.com> wrote:
...
Post by Kenny McCormack
First note/caveat: I'm not interested in any solution involving IFS, for
1) IFS-based solutions never work for me.
2) Changing IFS is inherently dangerous, because, well, IFS itself in
inherently dangerous. Yes, I know it has been somewhat de-fanged
recently - but it is still dangerous.
...
Post by Kenny McCormack
mapfile -td ';' <<< "$foo"
1) You can only have one, single character delimiter. It'd be nice if
you could have a reg-exp, like in GAWK.
2) If the output you're processing comes from a process, as is usually
mapfile -td ';' < <(someprocess | awk 1 ORS=)
It occured to me after posting this that a somewhat simpler approach would
be to just convert all the delimiters to newlines, like this:

mapfile -t < <(someprocess | sed 's/;/\n/g')
--
1) The only professionals who refer to their customers as "users" are
computer guys and drug dealers.
2) The only professionals who refer to their customers as "clients" are
lawyers and prostitutes.
Axel Reichert
2024-11-10 08:14:45 UTC
Permalink
Post by Kenny McCormack
Post by Kenny McCormack
mapfile -td ';' < <(someprocess | awk 1 ORS=)
[...]
Post by Kenny McCormack
mapfile -t < <(someprocess | sed 's/;/\n/g')
And in your original post you wrote:

There is a feature that is prominently missing from the shell language
(I am speaking primarily of bash here) - which is the ability to split
a string on a delimiter. This is a common operation in most other
text-processing oriented languages (AWK, Perl, etc).

So why bother with a shell solution and why bother with avoiding IFS,
when in the end you need to resort to AWK/sed anyway?

Do not get me wrong, I am learning a lot in this thread here, much of
the stuff is far beyond my level of expertise in shell programming, and
it would be great to have a shell-only solution for your inquiry, even
if only for "academic reasons" because, say, the solution (still to
come) may turn out to be too clumsy for daily use). I will applaud such
a result, but for the time being I would be happy if you could elaborate
somewhat more about your motivation for this exercise.

Best regards

Axel
Janis Papanagnou
2024-11-10 09:21:44 UTC
Permalink
Post by Kenny McCormack
Post by Kenny McCormack
Post by Kenny McCormack
mapfile -td ';' < <(someprocess | awk 1 ORS=)
[...]
Post by Kenny McCormack
mapfile -t < <(someprocess | sed 's/;/\n/g')
There is a feature that is prominently missing from the shell language
(I am speaking primarily of bash here) - which is the ability to split
a string on a delimiter. This is a common operation in most other
text-processing oriented languages (AWK, Perl, etc).
So why bother with a shell solution and why bother with avoiding IFS,
when in the end you need to resort to AWK/sed anyway?
If we're on a targeted platform, looking for a solution for a simple
function, but having made bad experiences with some technical detail
on the platform, then we're trying to peek in various directions to
find any (sufficiently acceptable) workaround. - Sort of; I guess.

(Personally I'd say that 'IFS' is not that bad to completely avoid
it. But I'm not the OP.)
Post by Kenny McCormack
Do not get me wrong, I am learning a lot in this thread here, much of
the stuff is far beyond my level of expertise in shell programming, and
it would be great to have a shell-only solution for your inquiry, even
if only for "academic reasons" because, say, the solution (still to
come) may turn out to be too clumsy for daily use). I will applaud such
a result, but for the time being I would be happy if you could elaborate
somewhat more about your motivation for this exercise.
Since the OP is (usually) very clear about excluding some solutions
as acceptable by him I haven't bothered replying. - But if you're
asking I'd probably try for a shell-only solution along the way of
doing a substitution like arr=( ${var//[$'\n';]/|} ) to fill an
array, whereby defining 'IFS' appropriately (using the '|' here).
(This is just the outline of an idea not a solution, so be careful
with that fragment!)

To add another thought to the original question, and since I know
that the OP already has the relevant experience for that; for such
a basic function writing a shell built-in could be appropriate.
(That's not portable, but I think the OP doesn't care about that.)

Janis
Post by Kenny McCormack
Best regards
Axel
Kenny McCormack
2024-11-10 09:32:10 UTC
Permalink
In article <vgptv9$ahrg$***@dont-email.me>,
Janis Papanagnou <janis_papanagnou+***@hotmail.com> wrote:
...
Post by Janis Papanagnou
To add another thought to the original question, and since I know
that the OP already has the relevant experience for that; for such
a basic function writing a shell built-in could be appropriate.
(That's not portable, but I think the OP doesn't care about that.)
Not a bad idea, actually.

In fact, one of my motivations for posting is the hope that a bash dev
might see this and be motivated to add the functionality to the shell.
Then I wouldn't have to write it myself.

That and/or be motivated to fix the IFS handling in the shell.
--
Marshall: 10/22/51
Jessica: 4/4/79
Janis Papanagnou
2024-11-10 13:05:29 UTC
Permalink
Post by Kenny McCormack
In fact, one of my motivations for posting is the hope that a bash dev
might see this and be motivated to add the functionality to the shell.
Then I wouldn't have to write it myself.
:-)
Post by Kenny McCormack
That and/or be motivated to fix the IFS handling in the shell.
What specifically are you referring to? - I can't imagine that there's
a bash specific bug with IFS since IFS handling is a very old concept
in Unixes' shell. - OTOH, I cannot think about changing long existing
behavior without breaking tons of code. This would at best [in bash]
lead to yet another 'shopt' setting to be explicitly activated. - But
how should it then behave?

Janis
Kenny McCormack
2024-12-09 12:32:41 UTC
Permalink
Post by Kenny McCormack
Post by Kenny McCormack
Post by Kenny McCormack
mapfile -td ';' < <(someprocess | awk 1 ORS=)
[...]
Post by Kenny McCormack
mapfile -t < <(someprocess | sed 's/;/\n/g')
There is a feature that is prominently missing from the shell language
(I am speaking primarily of bash here) - which is the ability to split
a string on a delimiter. This is a common operation in most other
text-processing oriented languages (AWK, Perl, etc).
So why bother with a shell solution and why bother with avoiding IFS,
when in the end you need to resort to AWK/sed anyway?
Because I need the result in the shell script.

(Obviously) Re-writing the whole app in AWK is not an option. Certainly
not at this point in time.
Post by Kenny McCormack
Do not get me wrong, I am learning a lot in this thread here, much of
the stuff is far beyond my level of expertise in shell programming, and
it would be great to have a shell-only solution for your inquiry, even
if only for "academic reasons" because, say, the solution (still to
come) may turn out to be too clumsy for daily use). I will applaud such
a result, but for the time being I would be happy if you could elaborate
somewhat more about your motivation for this exercise.
Primarily in the hopes that someone on the dev side of my chosen shell
(bash) will see it and say to themselves "Hey, that's a good idea."

I prefer to post about these sorts of things here (by "here", I mean
Usenet). I'm too lazy to participate in the "official" channels for the
FOSS software that I use.
--
Debating creationists on the topic of evolution is rather like trying to
play chess with a pigeon --- it knocks the pieces over, craps on the
board, and flies back to its flock to claim victory.
Loading...