Numerically sorted arguments (in shell)

Discussion:

Numerically sorted arguments (in shell)

(too old to reply)

Janis Papanagnou

2024-06-14 07:31:18 UTC

I'm using ksh here...

I can set the shell parameters in numerical order

$ set {1..100}

then sort them _lexicographically_

$ set -s

Or do both in one

$ set -s {1..100}

I haven't found anything to sort them _numerically_ in shell.

What I'm trying to do is iterating over files, say,
P1.HTM P10.HTM P11.HTM P2.HTM P3.HTM P4.HTM P5.HTM P6.HTM P7.HTM
P8.HTM P9.HTM
in numerical order.

Setting the files as shell arguments with P*.HTM will also produce
lexicographical order.

The preceding files are just samples. It should work also if the
numbers are non-consecutive (say, 2, 10, 10000, 3333333) so that
iterating using a for-loop and building the list is not an option.

(Ideally I'd also like to handle names with two numbers "A35P56.txt"
and irregular string components (lowercase, say, "page310ch1.txt"),
but that's just a nice-to-have. - I might make use of 'sort'?)

But the primary question is; how to organize/iterate the arguments
*numerically* _in shell_? (If that's possible in some simple way.)

N.B.: I prefer not to use external commands like 'sort' because of
the negative side effects and bulky code to handle newlines and
blanks in filenames, and messing around with quotes.

Janis

Axel Reichert

2024-06-15 13:56:22 UTC

Post by Janis Papanagnou
What I'm trying to do is iterating over files, say,
P1.HTM P10.HTM P11.HTM P2.HTM P3.HTM P4.HTM P5.HTM P6.HTM P7.HTM
P8.HTM P9.HTM
in numerical order.

Could you employ printf to add leading zeros, sort lexicogpaphically and
then remove the zeros?

Post by Janis Papanagnou
(Ideally I'd also like to handle names with two numbers "A35P56.txt"
and irregular string components (lowercase, say, "page310ch1.txt"),
but that's just a nice-to-have. - I might make use of 'sort'?)

You did not yet mention what your final goal is with the numerically
sorted list.

In case this is in the end a renaming task, for this level of complexity
I would use the "wdired" mode of Emacs ("write directory edits") and use
regexes for search and replace. Or some other "multi-rename" tools from
the command line.

Best regards

Axel

Janis Papanagnou

2024-06-15 14:43:50 UTC

Post by Axel Reichert

Post by Janis Papanagnou
What I'm trying to do is iterating over files, say,
P1.HTM P10.HTM P11.HTM P2.HTM P3.HTM P4.HTM P5.HTM P6.HTM P7.HTM
P8.HTM P9.HTM
in numerical order.

Could you employ printf to add leading zeros, sort lexicogpaphically and
then remove the zeros?

Something like this (not using printf but a more elementary method) is
actually what I'm currently doing. (It's not really complex but quite
some data fiddling I wanted to avoid. I have that in a script and it's
not a general solution but handles just simple cases like the sample
data above, which is 95% of my usages so it's okay for that but not
really satisfying for other or more general cases.)

To get the original name back I think I'd have to store the original
names along with the new names. (Which is something I've not yet done.)

Post by Axel Reichert

Post by Janis Papanagnou
(Ideally I'd also like to handle names with two numbers "A35P56.txt"
and irregular string components (lowercase, say, "page310ch1.txt"),
but that's just a nice-to-have. - I might make use of 'sort'?)

You did not yet mention what your final goal is with the numerically
sorted list.

The original application was that I simply wanted to sequentially skim
through a number of files.[*] In the past (where possible) I've just
renamed the files to let them have numbers of equal length (as noted
above). But the general task I envision is that I don't want to change
any name of data files but just be able to iterate over these files,
or list them numerically sorted (and without the known issues of \n
and blank handling).

I thought that a contemporary shell would probably support that but I
was astonished that (at least in ksh) it wasn't supported (as far as
I saw).[**]

Post by Axel Reichert
In case this is in the end a renaming task, for this level of complexity
I would use the "wdired" mode of Emacs ("write directory edits") and use
regexes for search and replace. Or some other "multi-rename" tools from
the command line.

I've my own script to adjust numbers in files. But as said, I'd rather
want to iterate or sort, like the lexicographic ordering in ksh

set -s page*.gif

(which in that example is anyway the default for wildcard patterns)
something similar for numeric argument setting (or pattern expansion)

set --numerical page*.gif

To use that features more widely it would be nice if the wild-card
expansion could be controlled, say by

set -o numerical

Well, maybe that all makes no sense and should be tackled differently?
But it's how it appears to me at the moment. (Feel free to enlighten
me. :-)

Janis

[*] I occasionally have this task; the last time was when I wanted to
read old typewriter documents that had been scanned page-wise as GIF
files.

[**] Yet I haven't checked Zsh; that shell supports some non-standard
modifiers in certain zsh-specific constructs, so it might possible it
has support for this requirement as well. (But Zsh is not the shell
I'm using so I'm primarily seeking for a Ksh solution or POSIX shell
workaround.)

Lawrence D'Oliveiro

2024-06-15 22:49:05 UTC

Post by Janis Papanagnou
I'm using ksh here...

At some point, you have to accept that trying to do everything in a shell
language is not the best way to go, and that it is time to switch to a
“real” programming language.

For example, Perl or Python could do this much more easily.

Janis Papanagnou

2024-06-16 03:11:29 UTC

Post by Lawrence D'Oliveiro

Post by Janis Papanagnou
I'm using ksh here...

At some point, you have to accept that trying to do everything in a shell
language is not the best way to go, and that it is time to switch to a
“real” programming language.

What argument are you trying to make up? What makes you think
I'm doing "everything in a shell"?

(My approach is to take the appropriate tools and language from
the set of (a dozen, or so) "real" languages I know, plus from
the set of a handful of scripting languages that I know.)

Post by Lawrence D'Oliveiro
For example, Perl or Python could do this much more easily.

I don't know of this feature in Perl or Python; please provide
some hint if there is a feature like the one I need. Some code
samples for demonstration of your point are also welcome.

(Only in case you missed it; I'm not [primarily] looking for a
program; for the described task I'm fine with what I have done.
Rather I'm looking for an inherent feature that supports what I
described elsethread. And I like to have this feature in shell
since a shell is my standard interface to my Unix system. HTH.)

Janis

Lawrence D'Oliveiro

2024-06-16 04:56:39 UTC

I don't know of this feature in Perl or Python; please provide some hint
if there is a feature like the one I need. Some code samples for
demonstration of your point are also welcome.

Python solution:

import re

items = \
[
"P1.HTM", "P10.HTM", "P11.HTM", "P2.HTM", "P3.HTM",
"P4.HTM", "P5.HTM", "P6.HTM", "P7.HTM", "P8.HTM", "P9.HTM",
]

print(items)

print \
(
sorted
(
items,
key = lambda f :
tuple
(
(lambda : p, lambda : int(p))[i % 2 != 0]()
for i, p in enumerate(re.split("([0-9]+)", f))
)
)
)

output:

['P1.HTM', 'P10.HTM', 'P11.HTM', 'P2.HTM', 'P3.HTM', 'P4.HTM', 'P5.HTM', 'P6.HTM', 'P7.HTM', 'P8.HTM', 'P9.HTM']
['P1.HTM', 'P2.HTM', 'P3.HTM', 'P4.HTM', 'P5.HTM', 'P6.HTM', 'P7.HTM', 'P8.HTM', 'P9.HTM', 'P10.HTM', 'P11.HTM']

Janis Papanagnou

2024-06-16 09:48:25 UTC

Post by Lawrence D'Oliveiro

I don't know of this feature in Perl or Python; please provide some hint
if there is a feature like the one I need. Some code samples for
demonstration of your point are also welcome.

import re
items = \
[
"P1.HTM", "P10.HTM", "P11.HTM", "P2.HTM", "P3.HTM",
"P4.HTM", "P5.HTM", "P6.HTM", "P7.HTM", "P8.HTM", "P9.HTM",
]
print(items)
print \
(
sorted
(
items,
tuple
(
(lambda : p, lambda : int(p))[i % 2 != 0]()
for i, p in enumerate(re.split("([0-9]+)", f))
)
)
)
['P1.HTM', 'P10.HTM', 'P11.HTM', 'P2.HTM', 'P3.HTM', 'P4.HTM', 'P5.HTM', 'P6.HTM', 'P7.HTM', 'P8.HTM', 'P9.HTM']
['P1.HTM', 'P2.HTM', 'P3.HTM', 'P4.HTM', 'P5.HTM', 'P6.HTM', 'P7.HTM', 'P8.HTM', 'P9.HTM', 'P10.HTM', 'P11.HTM']

Thanks. Though I'm not familiar with Python to understand that code;
it's too far from any language I've been using.

The (for me) interesting question, though, is; how does it solve the
task I had been addressing? - For convenience I reiterate one main
application...

I want from my shell command line interface call a viewer (or any
other application) with a list of files. If in shell I do, e.g.,

viewer P*.HTM

the list gets sorted lexicographically. How would the main function
look like that I could embed in my call to make a numerically sorted
list. Say, something like, for example,

viewer $( p_sort P*.HTM )

where p_sort would be the Python code. - Note: this is no appropriate
solution since it would anyway not work correctly for file names with
embedded blanks and newlines. I just want to get a closer understanding
how you think this would be usable in shell (or from shell). Thanks.

Janis

Helmut Waitzmann

2024-06-16 19:52:58 UTC

How would the main function look like that I could embed in my
call to make a numerically sorted list. Say, something like, for
example,
viewer $( p_sort P*.HTM )
where p_sort would be the Python code. - Note: this is no
appropriate solution since it would anyway not work correctly
for file names with embedded blanks and newlines. I just want to
get a closer understanding how you think this would be usable in
shell (or from shell). Thanks.

If ‘p_sort’ is designed to output the sorted file names separated
by an ASCII NUL character rather than a newline then, using the
GNU version of ‘xargs’, one can feed that output into ‘xargs’:

{
p_sort P*.HTM 3<&- |
xargs --null --no-run-if-empty -- sh -c \
'exec 0<&3 3<&- "$@"' sh \
viewer
} 3<&0

This will avoid the problems with funny characters (including
blanks and linefeeds) in filenames processed by the shell.

Janis Papanagnou

2024-06-17 06:30:54 UTC

How would the main function look like that I could embed in my call to
make a numerically sorted list. Say, something like, for example,
viewer $( p_sort P*.HTM )
where p_sort would be the Python code. - Note: this is no appropriate
solution since it would anyway not work correctly for file names with
embedded blanks and newlines. I just want to get a closer
understanding how you think this would be usable in shell (or from
shell). Thanks.

If ‘p_sort’ is designed to output the sorted file names separated by an
ASCII NUL character rather than a newline then, using the GNU version of
{
p_sort P*.HTM 3<&- |
xargs --null --no-run-if-empty -- sh -c \
viewer
} 3<&0
This will avoid the problems with funny characters (including blanks and
linefeeds) in filenames processed by the shell.

I'm sure it does. You've actually shown a way to circumvent all the
issues with $( ... ) . So I'd probably write a wrapper to make that
code usable in a simpler way. Thanks.

And for the 'p_sort' function I'll resort to some standard tools...

Janis

Geoff Clare

2024-06-18 12:32:06 UTC

Post by Helmut Waitzmann
If ‘p_sort’ is designed to output the sorted file names separated
by an ASCII NUL character rather than a newline then, using the
{
p_sort P*.HTM 3<&- |
xargs --null --no-run-if-empty -- sh -c \
viewer
} 3<&0

NUL as a record separator is also supported by several other versions
of xargs, and it is in the recently released POSIX.1-2024 standard.
In all of those it is specified with -0, so using -0 is more
portable than the GNU-specific --null.

POSIX.1-2024 also has -r although I think that's not as widely
supported in current xargs implementations as -0. It should become
better supported over time, though, so again I would suggest using -r
rather than --no-run-if-empty for better future portability.

--
Geoff Clare <***@gclare.org.uk>

Helmut Waitzmann

2024-06-19 00:22:59 UTC

Post by Geoff Clare

Post by Helmut Waitzmann
If ‘p_sort’ is designed to output the sorted file names
separated by an ASCII NUL character rather than a newline
then, using the GNU version of ‘xargs’, one can feed that
{
p_sort P*.HTM 3<&- |
xargs --null --no-run-if-empty -- sh -c \
viewer
} 3<&0

NUL as a record separator is also supported by several other
versions of xargs, and it is in the recently released
POSIX.1-2024 standard.

I'm glad to read that. I didn't know either.

Post by Geoff Clare
In all of those it is specified with -0, so using -0 is more
portable than the GNU-specific --null.

Yes, of course: If ‘-0’ is in the POSIX standard, it is
preferable over ‘--null’.

Post by Geoff Clare
POSIX.1-2024 also has -r although I think that's not as widely
supported in current xargs implementations as -0. It should
become better supported over time, though, so again I would
suggest using -r rather than --no-run-if-empty for better future
portability.

I didn't know, that ‘-0’ as well as ‘-r’ are more widely
available (with the same semantics) than just in the GNU
version. To minimize the risk of having a ‘xargs’ version, which
by accident uses the options ‘-0’ or ‘-r’ with different
semantics than GNU ‘xargs’ does, I preferred the long options (in
particular ‘--no-run-if-empty’) over the short.

Lawrence D'Oliveiro

2024-06-17 05:42:57 UTC

How would the main function look like that I could embed in my call to
make a numerically sorted list.

That’s what the code I posted does.

Janis Papanagnou

2024-06-17 06:32:21 UTC

Post by Lawrence D'Oliveiro

How would the main function look like that I could embed in my call to
make a numerically sorted list.

That’s what the code I posted does.

Erm, really? - I've got the impression that it rather sorts only
the _hard-coded data_, and not to work with arbitrary arguments.
Anyway. Don't bother.

Janis

Lawrence D'Oliveiro

2024-06-17 07:16:40 UTC

... I've got the impression that it rather sorts only the
_hard-coded data_ ...

So get the data from the usual sources, e.g. os.listdir().

Anyway. Don't bother.

I can only lead the horse to water, I cannot make you drink.

Janis Papanagnou

2024-06-17 07:44:55 UTC

Post by Lawrence D'Oliveiro

... I've got the impression that it rather sorts only the
_hard-coded data_ ...

So get the data from the usual sources, e.g. os.listdir().

Anyway. Don't bother.

I can only lead the horse to water, I cannot make you drink.

I wouldn't drink from a poisoned spring. IOW; I'd have to learn
Python completely to understand your code and get the details
properly. If it's as simple as you suggested, and since we're
not in a Python NG, I thought you'd have been able to address
the original question with your code (as a black box). In the
present form it's just useless and off-topic here. But as said,
don't bother.

Janis

Lawrence D'Oliveiro

2024-06-18 08:26:45 UTC

IOW; I'd have to learn Python completely to understand your code and get
the details properly.

I give you a fish, you eat for a day. You learn to fish, you eat for a
lifetime.

In the present form it's just useless and off-topic here. But as said,
don't bother.

Have you received a better offer yet?

Kenny McCormack

2024-06-18 11:14:19 UTC

Post by Lawrence D'Oliveiro

IOW; I'd have to learn Python completely to understand your code and get
the details properly.

I give you a fish, you eat for a day. You learn to fish, you eat for a
lifetime.

In the present form it's just useless and off-topic here. But as said,
don't bother.

Have you received a better offer yet?

Sounds like somebody's fee-fees got hurt here.

--
"Only a genius could lose a billion dollars running a casino."
"You know what they say: the house always loses."
"When life gives you lemons, don't pay taxes."
"Grab 'em by the p***y!"

Janis Papanagnou

2024-06-18 14:27:33 UTC

Post by Lawrence D'Oliveiro

In the present form it's just useless and off-topic here. But as said,
don't bother.

Have you received a better offer yet?

Better than /dev/random ? - No, not yet. But thanks for trying.

(You gave the impression that using python would be "better".
Obviously it isn't. - Why buy a non-standard tool/solution if
I can solve the task with standard Unix tools myself. So don't
bother.)

Janis

Kaz Kylheku

2024-06-19 01:05:20 UTC

Post by Lawrence D'Oliveiro

IOW; I'd have to learn Python completely to understand your code and get
the details properly.

I give you a fish, you eat for a day. You learn to fish, you eat for a
lifetime.

You fall into the Python trap, and eat shit for a lifetime.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca

Eric Pozharski

2024-06-16 18:00:11 UTC

Post by Lawrence D'Oliveiro

Post by Janis Papanagnou
I'm using ksh here...

*SKIP* [ 8 lines 2 levels deep] # borderly ad hominem

(My approach is to take the appropriate tools and language from the
set of (a dozen, or so) "real" languages I know, plus from the set of
a handful of scripting languages that I know.)

(Disclaimer: I'm ksh-ignorant) Speaking of features.

{14439:44} [0:0]% print -cC6 *
bar-20.baz bar-3.baz foo-10.bar foo-23.bar foo-5.bar
bar-21.baz bar-6.baz foo-13.bar foo-29.bar foo-6.bar
bar-24.baz bar-8.baz foo-1.bar foo-3.bar foo-7.bar
bar-26.baz foo-0.bar foo-22.bar foo-4.bar foo-8.bar
{14445:45} [0:0]% print -cC6 *(n)
bar-3.baz bar-21.baz foo-1.bar foo-6.bar foo-13.bar
bar-6.baz bar-24.baz foo-3.bar foo-7.bar foo-22.bar
bar-8.baz bar-26.baz foo-4.bar foo-8.bar foo-23.bar
bar-20.baz foo-0.bar foo-5.bar foo-10.bar foo-29.bar

That nymph between weapon and tool is 'glob qualifier' (acts at
'filename generation' phase). But! It's zsh. That being said, as a
result of cross-pollination, something similar might be in ksh too. I
can't say where to dig through ksh-documentation.

Post by Lawrence D'Oliveiro
For example, Perl or Python could do this much more easily.

I don't know of this feature in Perl or Python; please provide some
hint if there is a feature like the one I need. Some code samples for
demonstration of your point are also welcome.

Well, here's a hint:

Rule#34: If you can imagine sex, there's a module for this.

*CUT* [ 10 lines 1 level deep]

--
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom

Janis Papanagnou

2024-06-17 06:21:55 UTC

Post by Eric Pozharski

[sorting file names numerically]

(Disclaimer: I'm ksh-ignorant) Speaking of features.
{14439:44} [0:0]% print -cC6 *
bar-20.baz bar-3.baz foo-10.bar foo-23.bar foo-5.bar
bar-21.baz bar-6.baz foo-13.bar foo-29.bar foo-6.bar
bar-24.baz bar-8.baz foo-1.bar foo-3.bar foo-7.bar
bar-26.baz foo-0.bar foo-22.bar foo-4.bar foo-8.bar
{14445:45} [0:0]% print -cC6 *(n)
bar-3.baz bar-21.baz foo-1.bar foo-6.bar foo-13.bar
bar-6.baz bar-24.baz foo-3.bar foo-7.bar foo-22.bar
bar-8.baz bar-26.baz foo-4.bar foo-8.bar foo-23.bar
bar-20.baz foo-0.bar foo-5.bar foo-10.bar foo-29.bar
That nymph between weapon and tool is 'glob qualifier' (acts at
'filename generation' phase). But! It's zsh.

Yeah, I was positive that Zsh supports such a qualifier.

Post by Eric Pozharski
That being said, as a
result of cross-pollination, something similar might be in ksh too. I
can't say where to dig through ksh-documentation.

Well, I don't know of any in Ksh. (That's my problem.)

Janis

Eric Pozharski

2024-06-18 14:54:58 UTC

*SKIP* [ 20 lines 3 levels deep]

Post by Janis Papanagnou

That being said, as a result of cross-pollination, something similar
might be in ksh too. I can't say where to dig through
ksh-documentation.

Well, I don't know of any in Ksh. (That's my problem.)

Is it because oh-my-bad documentation or ksh seeks minimal feature-set?

p.s. Lack of features is a feature by itself, there's that.

--
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom

Janis Papanagnou

2024-06-19 00:18:50 UTC

[ zsh's glob qualifier for numerically sorted glob expansion ]

Post by Eric Pozharski
*SKIP* [ 20 lines 3 levels deep]

Post by Janis Papanagnou

That being said, as a result of cross-pollination, something similar
might be in ksh too. I can't say where to dig through
ksh-documentation.

Well, I don't know of any in Ksh. (That's my problem.)

Is it because oh-my-bad documentation or ksh seeks minimal feature-set?

Ksh has really a lot features. But not this ["basic" (sort of)] one.

(Well, I might as well have just missed it in the docs, but there's
also the Bolsky/Korn book where I didn't see it. And I'm using that
shell so long. And I've also got no hints yet.)

Post by Eric Pozharski
p.s. Lack of features is a feature by itself, there's that.

Lacking features is certainly no feature of Ksh. ;-)

Janis

Chris Elvidge

2024-06-18 13:32:15 UTC

Post by Janis Papanagnou
I'm using ksh here...
I can set the shell parameters in numerical order
$ set {1..100}
then sort them _lexicographically_
$ set -s
Or do both in one
$ set -s {1..100}
I haven't found anything to sort them _numerically_ in shell.
What I'm trying to do is iterating over files, say,
P1.HTM P10.HTM P11.HTM P2.HTM P3.HTM P4.HTM P5.HTM P6.HTM P7.HTM
P8.HTM P9.HTM
in numerical order.
Setting the files as shell arguments with P*.HTM will also produce
lexicographical order.
The preceding files are just samples. It should work also if the
numbers are non-consecutive (say, 2, 10, 10000, 3333333) so that
iterating using a for-loop and building the list is not an option.
(Ideally I'd also like to handle names with two numbers "A35P56.txt"
and irregular string components (lowercase, say, "page310ch1.txt"),
but that's just a nice-to-have. - I might make use of 'sort'?)
But the primary question is; how to organize/iterate the arguments
*numerically* _in shell_? (If that's possible in some simple way.)
N.B.: I prefer not to use external commands like 'sort' because of
the negative side effects and bulky code to handle newlines and
blanks in filenames, and messing around with quotes.
Janis

Can you use an array? E.g. (bash, I don't know ksh, but could be similar)

for i in P1.HTM P10.HTM P11.HTM P2.HTM P3.HTM P4.HTM P5.HTM P6.HTM
P7.HTM P8.HTM P9.HTM; do
j=${i//[![:digit:]]}
files[j]="$i"
done
printf '%s\n' "${files[@]}"
P1.HTM
P2.HTM
P3.HTM
P4.HTM
P5.HTM
P6.HTM
P7.HTM
P8.HTM
P9.HTM
P10.HTM
P11.HTM

I'll have to work on names with two (or more?) numbers.

--
Chris Elvidge, England
BART BUCKS, ARE NOT LEGAL TENDER

Janis Papanagnou

2024-06-18 14:38:30 UTC

Post by Chris Elvidge

Post by Janis Papanagnou
I'm using ksh here...
I can set the shell parameters in numerical order
$ set {1..100}
then sort them _lexicographically_
$ set -s
Or do both in one
$ set -s {1..100}
I haven't found anything to sort them _numerically_ in shell.
What I'm trying to do is iterating over files, say,
P1.HTM P10.HTM P11.HTM P2.HTM P3.HTM P4.HTM P5.HTM P6.HTM P7.HTM
P8.HTM P9.HTM
in numerical order.
Setting the files as shell arguments with P*.HTM will also produce
lexicographical order.
The preceding files are just samples. It should work also if the
numbers are non-consecutive (say, 2, 10, 10000, 3333333) so that
iterating using a for-loop and building the list is not an option.
(Ideally I'd also like to handle names with two numbers "A35P56.txt"
and irregular string components (lowercase, say, "page310ch1.txt"),
but that's just a nice-to-have. - I might make use of 'sort'?)
But the primary question is; how to organize/iterate the arguments
*numerically* _in shell_? (If that's possible in some simple way.)
N.B.: I prefer not to use external commands like 'sort' because of
the negative side effects and bulky code to handle newlines and
blanks in filenames, and messing around with quotes.
Janis

Can you use an array? E.g. (bash, I don't know ksh, but could be similar)

Yes, Ksh supports both, indexed and associative arrays.

Post by Chris Elvidge
for i in P1.HTM P10.HTM P11.HTM P2.HTM P3.HTM P4.HTM P5.HTM P6.HTM
P7.HTM P8.HTM P9.HTM; do
j=${i//[![:digit:]]}
files[j]="$i"
done
P1.HTM
P2.HTM
P3.HTM
P4.HTM
P5.HTM
P6.HTM
P7.HTM
P8.HTM
P9.HTM
P10.HTM
P11.HTM
I'll have to work on names with two (or more?) numbers.

One thing that concerns me with arrays is that I seem to recall that
there was a limit in the number of array elements (which might be an
issue on lengthy lists of files). But some ad hoc tests seem to show
that if there's a limit it's not any more in the 1k/4k elements range
as it had been. (Bolski/Korn says their arrays support at least 4k.)

Janis

Chris Elvidge

2024-06-18 15:19:45 UTC

Post by Janis Papanagnou

Post by Chris Elvidge

Post by Janis Papanagnou
I'm using ksh here...
I can set the shell parameters in numerical order
$ set {1..100}
then sort them _lexicographically_
$ set -s
Or do both in one
$ set -s {1..100}
I haven't found anything to sort them _numerically_ in shell.
What I'm trying to do is iterating over files, say,
P1.HTM P10.HTM P11.HTM P2.HTM P3.HTM P4.HTM P5.HTM P6.HTM P7.HTM
P8.HTM P9.HTM
in numerical order.
Setting the files as shell arguments with P*.HTM will also produce
lexicographical order.
The preceding files are just samples. It should work also if the
numbers are non-consecutive (say, 2, 10, 10000, 3333333) so that
iterating using a for-loop and building the list is not an option.
(Ideally I'd also like to handle names with two numbers "A35P56.txt"
and irregular string components (lowercase, say, "page310ch1.txt"),
but that's just a nice-to-have. - I might make use of 'sort'?)
But the primary question is; how to organize/iterate the arguments
*numerically* _in shell_? (If that's possible in some simple way.)
N.B.: I prefer not to use external commands like 'sort' because of
the negative side effects and bulky code to handle newlines and
blanks in filenames, and messing around with quotes.
Janis

Can you use an array? E.g. (bash, I don't know ksh, but could be similar)

Yes, Ksh supports both, indexed and associative arrays.

Post by Chris Elvidge
for i in P1.HTM P10.HTM P11.HTM P2.HTM P3.HTM P4.HTM P5.HTM P6.HTM
P7.HTM P8.HTM P9.HTM; do
j=${i//[![:digit:]]}
files[j]="$i"
done
P1.HTM
P2.HTM
P3.HTM
P4.HTM
P5.HTM
P6.HTM
P7.HTM
P8.HTM
P9.HTM
P10.HTM
P11.HTM
I'll have to work on names with two (or more?) numbers.

One thing that concerns me with arrays is that I seem to recall that
there was a limit in the number of array elements (which might be an
issue on lengthy lists of files). But some ad hoc tests seem to show
that if there's a limit it's not any more in the 1k/4k elements range
as it had been. (Bolski/Korn says their arrays support at least 4k.)
Janis

I tested in ksh - works as written.

From here:
https://unix.stackexchange.com/questions/195191/ksh-bash-maximum-size-of-an-array

<quote>

This simple script shows on my systems (Gnu/Linux and Solaris):

ksh88 limits the size to 2^12-1 (4095). (subscript out of range ).
Some older releases like the one on HP-UX limit the size to 1023.

ksh93 limits the size of a array to 2^22-1 (4194303), your mileage
may vary.

bash doesn't look to impose any hard-coded limit outside the one
dictated by the underlying memory resources available. For example bash
uses 1.3 GB of virtual memory for an array size of 18074340.

</quote>

--
Chris Elvidge, England
BART BUCKS, ARE NOT LEGAL TENDER

Janis Papanagnou

2024-06-18 17:12:20 UTC

[...]

Post by Chris Elvidge

Post by Janis Papanagnou
One thing that concerns me with arrays is that I seem to recall that
there was a limit in the number of array elements (which might be an
issue on lengthy lists of files). But some ad hoc tests seem to show
that if there's a limit it's not any more in the 1k/4k elements range
as it had been. (Bolski/Korn says their arrays support at least 4k.)

I tested in ksh - works as written.
https://unix.stackexchange.com/questions/195191/ksh-bash-maximum-size-of-an-array
<quote>
ksh88 limits the size to 2^12-1 (4095). (subscript out of range ).
Some older releases like the one on HP-UX limit the size to 1023.

Yeah, I recall these limits from ksh88 on AIX and HP-UX.

Post by Chris Elvidge
ksh93 limits the size of a array to 2^22-1 (4194303), your mileage
may vary.

I've tried on my system just with a million names (yet more
filenames than we want to have in our directories in one place).

Post by Chris Elvidge
bash doesn't look to impose any hard-coded limit outside the one
dictated by the underlying memory resources available. For example bash
uses 1.3 GB of virtual memory for an array size of 18074340.

I didn't notice a limit for my old Bash or Ksh on my system.

So your array approach looks promising for one numeric key, and
it's a nice and terse solution.

Janis

Post by Chris Elvidge
</quote>

Janis Papanagnou

2024-06-18 17:23:33 UTC

Post by Janis Papanagnou
So your array approach looks promising for one numeric key, and
it's a nice and terse solution.

Forgot to praise its nice property to also handle newlines in the
filenames.

Post by Janis Papanagnou
Janis

Janis Papanagnou

2024-06-18 17:04:41 UTC

I've just tried a Unix tools based solution (with sed, sort, cut).

Up to and including the line containing 'shuf' is data generation,
the rest (starting with 'sed') extracts and sorts the data. I've
written it for TWO numeric sort keys (see printf format specifier)

for (( i=1; i<=50; i++ ))
do
for (( j=2; j<=120; j+=3 ))
do
printf "a%db%dc.txt\n" i j
done
done |
shuf |

sed 's/[^0-9]*\([0-9]\+\)[^0-9]*\([0-9]\+\)[^0-9]*/\1\t\2\t&/' |
sort -t$'\t' -k1n -k2n |
cut -f3-

For just one numeric argument this can be simplified (shorter sed
pattern, simpler sort -n command), and for more than two numeric
fields it can be modified to dynamically construct the sed pattern,
the sort option list, and the cut parameter, once at the beginning;
that way we could have a tool for arbitrary amounts of numeric keys
in the file name.

Note: this program doesn't handle pathological filenames (newlines).

Janis

Post by Janis Papanagnou
I'm using ksh here...
I can set the shell parameters in numerical order
$ set {1..100}
then sort them _lexicographically_
$ set -s
Or do both in one
$ set -s {1..100}
I haven't found anything to sort them _numerically_ in shell.
What I'm trying to do is iterating over files, say,
P1.HTM P10.HTM P11.HTM P2.HTM P3.HTM P4.HTM P5.HTM P6.HTM P7.HTM
P8.HTM P9.HTM
in numerical order.
Setting the files as shell arguments with P*.HTM will also produce
lexicographical order.
The preceding files are just samples. It should work also if the
numbers are non-consecutive (say, 2, 10, 10000, 3333333) so that
iterating using a for-loop and building the list is not an option.
(Ideally I'd also like to handle names with two numbers "A35P56.txt"
and irregular string components (lowercase, say, "page310ch1.txt"),
but that's just a nice-to-have. - I might make use of 'sort'?)
But the primary question is; how to organize/iterate the arguments
*numerically* _in shell_? (If that's possible in some simple way.)
N.B.: I prefer not to use external commands like 'sort' because of
the negative side effects and bulky code to handle newlines and
blanks in filenames, and messing around with quotes.
Janis

Chris Elvidge

2024-06-19 12:40:28 UTC

Post by Janis Papanagnou
I've just tried a Unix tools based solution (with sed, sort, cut).
Up to and including the line containing 'shuf' is data generation,
the rest (starting with 'sed') extracts and sorts the data. I've
written it for TWO numeric sort keys (see printf format specifier)
for (( i=1; i<=50; i++ )) do for (( j=2; j<=120; j+=3 )) do printf
"a%db%dc.txt\n" i j done done | shuf |
sed 's/[^0-9]*\([0-9]\+\)[^0-9]*\([0-9]\+\)[^0-9]*/\1\t\2\t&/' | sort
-t$'\t' -k1n -k2n | cut -f3-
For just one numeric argument this can be simplified (shorter sed
pattern, simpler sort -n command), and for more than two numeric
fields it can be modified to dynamically construct the sed pattern,
the sort option list, and the cut parameter, once at the beginning;
that way we could have a tool for arbitrary amounts of numeric keys
in the file name.
Note: this program doesn't handle pathological filenames (newlines).
Janis

If you're happy not handling pathological filenames:

for (( i=1; i<=50; i++ )); do for (( j=2; j<=120; j+=3 )); do touch
"a${i}b${j}c.txt"; done; done
to create the files.

exnums() { j="$(sed 's/[^[:digit:]]\+/ /g' <<<"$@")"; printf '%s%s\n'
"$j" "$@"; }
function replaces all non-digit sequences with a space, prints digit
sequence(s) and original input.

for i in *; do exnums "$i"; done | sort -k1n -k2n -k3n -k4n | awk
'{print $NF}'
sort doesn't seem to care how many -k you use, fields separated with space.
awk prints the last field of the input.

This "seems" to work with all manner of filenames from PNN.htm (as your
original sequence) to p323dc45g12.htm, p324dc45g12.htm, p333dc45g12.htm
Seems to work in ksh, too.

--
Chris Elvidge, England
TAR IS NOT A PLAYTHING

Janis Papanagnou

2024-06-19 13:11:27 UTC

Post by Chris Elvidge

Post by Janis Papanagnou
I've just tried a Unix tools based solution (with sed, sort, cut).
[...]
[...], and for more than two numeric
fields it can be modified to dynamically construct the sed pattern,
the sort option list, and the cut parameter, once at the beginning;
that way we could have a tool for arbitrary amounts of numeric keys in
the file name.
Note: this program doesn't handle pathological filenames (newlines).

Well, typically I can indeed ignore them. But it's better of course
to avoid situations where processing is compromised by such names.

Post by Chris Elvidge
for (( i=1; i<=50; i++ )); do for (( j=2; j<=120; j+=3 )); do touch
"a${i}b${j}c.txt"; done; done
to create the files.
function replaces all non-digit sequences with a space, prints digit
sequence(s) and original input.
for i in *; do exnums "$i"; done | sort -k1n -k2n -k3n -k4n | awk
'{print $NF}'
sort doesn't seem to care how many -k you use, fields separated with space.
awk prints the last field of the input.
This "seems" to work with all manner of filenames from PNN.htm (as your
original sequence) to p323dc45g12.htm, p324dc45g12.htm, p333dc45g12.htm
Seems to work in ksh, too.

I tried the approach I outlined above... (here just echo'ing the
created parts)...

N=${1:-1}
sed_a="[^0-9]*\([0-9]\+\)[^0-9]*"
sed_r="\1\t"
sort_a="-k1n"
for (( n=2; n<=N; n++ ))
do
sed_a+="\([0-9]\+\)[^0-9]*"
sed_r+="\\${n}\t"
sort_a+=" -k${n}n"
done
cut_a="-f$((N+1))-"

echo "# The following commands would be connected by pipes:"
echo "sed 's/${sed_a}/${sed_r}&/'"
echo "sort -t$'\t' ${sort_a}"
echo "cut ${cut_a}"

Janis

Chris Elvidge

2024-06-19 15:06:37 UTC

Post by Janis Papanagnou

Post by Chris Elvidge

Post by Janis Papanagnou
I've just tried a Unix tools based solution (with sed, sort, cut).
[...]
[...], and for more than two numeric
fields it can be modified to dynamically construct the sed pattern,
the sort option list, and the cut parameter, once at the beginning;
that way we could have a tool for arbitrary amounts of numeric keys in
the file name.
Note: this program doesn't handle pathological filenames (newlines).

Well, typically I can indeed ignore them. But it's better of course
to avoid situations where processing is compromised by such names.

Post by Chris Elvidge
for (( i=1; i<=50; i++ )); do for (( j=2; j<=120; j+=3 )); do touch
"a${i}b${j}c.txt"; done; done
to create the files.
function replaces all non-digit sequences with a space, prints digit
sequence(s) and original input.
for i in *; do exnums "$i"; done | sort -k1n -k2n -k3n -k4n | awk
'{print $NF}'
sort doesn't seem to care how many -k you use, fields separated with space.
awk prints the last field of the input.
This "seems" to work with all manner of filenames from PNN.htm (as your
original sequence) to p323dc45g12.htm, p324dc45g12.htm, p333dc45g12.htm
Seems to work in ksh, too.

I tried the approach I outlined above... (here just echo'ing the
created parts)...
N=${1:-1}
sed_a="[^0-9]*\([0-9]\+\)[^0-9]*"
sed_r="\1\t"
sort_a="-k1n"
for (( n=2; n<=N; n++ ))
do
sed_a+="\([0-9]\+\)[^0-9]*"
sed_r+="\\${n}\t"
sort_a+=" -k${n}n"
done
cut_a="-f$((N+1))-"
echo "# The following commands would be connected by pipes:"
echo "sed 's/${sed_a}/${sed_r}&/'"
echo "sort -t$'\t' ${sort_a}"
echo "cut ${cut_a}"
Janis

Your way is still restricted to filenames with a known number of sets of
digits, though (AFAICS). I.e. you pass N rather than finding it.

But it takes a long time to do it my way, a call to sed for each
filename, so I tried to cut down the time taken to do this and came up with:

bash: exnums() { shopt -s extglob; j="${@//+([^[:digit:]])/ }"; printf
'%s%s\n' "$j" "$@"; }

ksh: exnums() { j="${@//+([^[:digit:]])/ }"; printf '%s%s\n' "$j" "$@"; }

ksh seems to do the extglob needed for bash natively.

removing the sed calls from exnum changes the time taken from 37 secs to
under 1 sec with 2000+ files
ksh is faster than bash, ksh 50% of the bash time taken.

Substituting a tab for the replacement space in j= and -t$'\t' in sort
would seem to allow spaces in filenames, too, as you originally had it.

--
Chris Elvidge, England
SUBSTITUTE TEACHERS ARE NOT SCABS

vallor

2024-06-19 23:45:15 UTC

Post by Chris Elvidge

Post by Chris Elvidge

Post by Janis Papanagnou
I've just tried a Unix tools based solution (with sed, sort, cut).
[...]
[...], and for more than two numeric fields it can be modified to
dynamically construct the sed pattern, the sort option list, and the
cut parameter, once at the beginning; that way we could have a tool
for arbitrary amounts of numeric keys in the file name.
Note: this program doesn't handle pathological filenames (newlines).

Well, typically I can indeed ignore them. But it's better of course to
avoid situations where processing is compromised by such names.

Post by Chris Elvidge
for (( i=1; i<=50; i++ )); do for (( j=2; j<=120; j+=3 )); do touch
"a${i}b${j}c.txt"; done; done to create the files.
function replaces all non-digit sequences with a space, prints digit
sequence(s) and original input.
for i in *; do exnums "$i"; done | sort -k1n -k2n -k3n -k4n | awk
'{print $NF}'
sort doesn't seem to care how many -k you use, fields separated with space.
awk prints the last field of the input.
This "seems" to work with all manner of filenames from PNN.htm (as
your original sequence) to p323dc45g12.htm, p324dc45g12.htm,
p333dc45g12.htm Seems to work in ksh, too.

I tried the approach I outlined above... (here just echo'ing the
created parts)...
N=${1:-1}
sed_a="[^0-9]*\([0-9]\+\)[^0-9]*"
sed_r="\1\t"
sort_a="-k1n"
for (( n=2; n<=N; n++ ))
do
sed_a+="\([0-9]\+\)[^0-9]*"
sed_r+="\\${n}\t" sort_a+=" -k${n}n"
done cut_a="-f$((N+1))-"
echo "# The following commands would be connected by pipes:"
echo "sed 's/${sed_a}/${sed_r}&/'"
echo "sort -t$'\t' ${sort_a}"
echo "cut ${cut_a}"
Janis

Your way is still restricted to filenames with a known number of sets of
digits, though (AFAICS). I.e. you pass N rather than finding it.
But it takes a long time to do it my way, a call to sed for each
ksh seems to do the extglob needed for bash natively.
removing the sed calls from exnum changes the time taken from 37 secs to
under 1 sec with 2000+ files ksh is faster than bash, ksh 50% of the
bash time taken.
Substituting a tab for the replacement space in j= and -t$'\t' in sort
would seem to allow spaces in filenames, too, as you originally had it.

I finally remembered which tool has "versionsort(3)" -- it's ls:

$ ls -1
test10.txt
test1.txt
test2.txt

$ ls -v -1
test1.txt
test2.txt
test10.txt

Does that help?

--
-v

Janis Papanagnou

2024-06-20 04:43:13 UTC

Post by vallor

[...]

$ ls -1
test10.txt
test1.txt
test2.txt
$ ls -v -1
test1.txt
test2.txt
test10.txt
Does that help?

Sure, thanks. - Just remember that it's a non-standard option. But
for my GNU environment it's certainly a usable part of the solution.
It seems to also handle multiple numeric components as desired (as
versions usually have)

$ ls | shuf | xargs ls -v
1 1.2 1.11 2.1 2.10 10.1 10.10 11.1
1.1 1.10 2 2.2 2.11 10.2 10.11 11.2

Janis

Janis Papanagnou

2024-06-20 04:58:11 UTC

I finally remembered which tool has "versionsort(3)" -- [...]

It's a pity that this function is a GNU extension, otherwise
it could be used to implement the desired function in shells
(ksh, bash) as an additional globbing option (like the zsh
glob qualifier) or a new 'set' option to control the sorting.

Janis

vallor

2024-06-20 22:16:42 UTC

On Thu, 20 Jun 2024 06:58:11 +0200, Janis Papanagnou

I finally remembered which tool has "versionsort(3)" -- [...]

It's a pity that this function is a GNU extension, otherwise it could be
used to implement the desired function in shells (ksh, bash) as an
additional globbing option (like the zsh glob qualifier) or a new 'set'
option to control the sorting.
Janis

I just posted a python program to comp.lang.python that sorts parameters
using strverscmp(3). It also shell-escapes the (common) IFS characters.

--
-v

Lawrence D'Oliveiro

2024-06-21 03:37:51 UTC

Post by vallor
I just posted a python program to comp.lang.python that sorts parameters
using strverscmp(3).

I already posted a snippet here which sorts strings containing any number
of decimal-numerical segments.

vallor

2024-06-21 04:20:47 UTC

On Fri, 21 Jun 2024 03:37:51 -0000 (UTC), Lawrence D'Oliveiro

Post by Lawrence D'Oliveiro

Post by vallor
I just posted a python program to comp.lang.python that sorts
parameters using strverscmp(3).

I already posted a snippet here which sorts strings containing any
number of decimal-numerical segments.

While I can't speak for others, something about the way you
went about that rubbed me the wrong way.

More on-topic, I did post the code in comp.lang.python with a request
for comments. Sometimes I feel like Manfred Mann's "The Demolition Man",
where "I kill conversation as I walk into the room." It is no exception
there, even though it would seem to be a good example of using a C binding
with python's sort function. (Having very little experience with python
programming, I couldn't really say with confidence -- but nobody has
complained so far.)

I wrote it to learn something about python, and to be a (q&d) shell
utility that someone might find useful, possibly Janis. Since
this is the shell newsgroup, and not the python programming
newsgroup, I think discussing the finer points of python
programming really don't belong here. YMMV.

I'm also put-off by "snip-and-snark" Usenet. I know a few people who
do that to tick-off the person they're conversing with. I'm guilty
of doing that from time to time, but I hope it's only when
the situation warrants it. If my opinion matters: it's a bad habit to
do so by default.

--
-v

Lawrence D'Oliveiro

2024-06-21 05:41:01 UTC

While I can't speak for others, something about the way you went about
that rubbed me the wrong way.

I solved the specific problem that seems to be the stumbling block, and
left the rest as an exercise for the reader.

That wasn’t up to your particular high standards? You know what you can
do.

Kenny McCormack

2024-06-21 07:31:53 UTC

Post by Lawrence D'Oliveiro

While I can't speak for others, something about the way you went about
that rubbed me the wrong way.

I solved the specific problem that seems to be the stumbling block, and
left the rest as an exercise for the reader.
That wasnt up to your particular high standards? You know what you can
do.

Somebody's fee-fees are getting more than a little butt-hurt...

Sad.

--
Those on the right constantly remind us that America is not a
democracy; now they claim that Obama is a threat to democracy.

Chris Elvidge

2024-06-21 13:21:43 UTC

Post by vallor

Post by Chris Elvidge

Post by Chris Elvidge

Post by Janis Papanagnou
I've just tried a Unix tools based solution (with sed, sort, cut).
[...]
[...], and for more than two numeric fields it can be modified to
dynamically construct the sed pattern, the sort option list, and the
cut parameter, once at the beginning; that way we could have a tool
for arbitrary amounts of numeric keys in the file name.
Note: this program doesn't handle pathological filenames (newlines).

Well, typically I can indeed ignore them. But it's better of course to
avoid situations where processing is compromised by such names.

Post by Chris Elvidge
for (( i=1; i<=50; i++ )); do for (( j=2; j<=120; j+=3 )); do touch
"a${i}b${j}c.txt"; done; done to create the files.
function replaces all non-digit sequences with a space, prints digit
sequence(s) and original input.
for i in *; do exnums "$i"; done | sort -k1n -k2n -k3n -k4n | awk
'{print $NF}'
sort doesn't seem to care how many -k you use, fields separated with space.
awk prints the last field of the input.
This "seems" to work with all manner of filenames from PNN.htm (as
your original sequence) to p323dc45g12.htm, p324dc45g12.htm,
p333dc45g12.htm Seems to work in ksh, too.

I tried the approach I outlined above... (here just echo'ing the
created parts)...
N=${1:-1}
sed_a="[^0-9]*\([0-9]\+\)[^0-9]*"
sed_r="\1\t"
sort_a="-k1n"
for (( n=2; n<=N; n++ ))
do
sed_a+="\([0-9]\+\)[^0-9]*"
sed_r+="\\${n}\t" sort_a+=" -k${n}n"
done cut_a="-f$((N+1))-"
echo "# The following commands would be connected by pipes:"
echo "sed 's/${sed_a}/${sed_r}&/'"
echo "sort -t$'\t' ${sort_a}"
echo "cut ${cut_a}"
Janis

Your way is still restricted to filenames with a known number of sets of
digits, though (AFAICS). I.e. you pass N rather than finding it.
But it takes a long time to do it my way, a call to sed for each
ksh seems to do the extglob needed for bash natively.
removing the sed calls from exnum changes the time taken from 37 secs to
under 1 sec with 2000+ files ksh is faster than bash, ksh 50% of the
bash time taken.
Substituting a tab for the replacement space in j= and -t$'\t' in sort
would seem to allow spaces in filenames, too, as you originally had it.

$ ls -1
test10.txt
test1.txt
test2.txt
$ ls -v -1
test1.txt
test2.txt
test10.txt
Does that help?

I didn't realise it could work like that. Thanks.

--
Chris Elvidge
England

Kenny McCormack

2024-06-24 13:22:28 UTC

In article <***@slack15-a.local.uk>,
...

Post by Chris Elvidge

Post by vallor
$ ls -1
test10.txt
test1.txt
test2.txt
$ ls -v -1
test1.txt
test2.txt
test10.txt
Does that help?

I didn't realise it could work like that. Thanks.

To OP: Does "ls -v" meet your criteria?

--
Faced with the choice between changing one's mind and proving that there is
no need to do so, almost everyone gets busy on the proof.

- John Kenneth Galbraith -

Janis Papanagnou

2024-06-24 14:01:03 UTC

Post by Kenny McCormack
...

Post by Chris Elvidge

Post by vallor
$ ls -1
test10.txt
test1.txt
test2.txt
$ ls -v -1
test1.txt
test2.txt
test10.txt
Does that help?

I didn't realise it could work like that. Thanks.

To OP: Does "ls -v" meet your criteria?

Yes, at least partly. (Why are you asking?)

It meets it in the way that it's ready available and usable
as external command. It does not solve the shell internal
additional globbing feature (like in Zsh) that would be
nice and preferable.

In the quoted form as 'ls -vQ' some pathological filenames
are (seemingly) handled, but there's some hassle with the
quotes in the subsequent processing steps to expect (or so
I think). That's why an integrated form supported by shell
would IMO be an advantage; so that we could simply write

set -o numsortglob # <-- hypothetical shell feature
for f in version*.gz
do ...
done

At least the output from code like

for f in $(ls -vQ) ; do printf "'%s'\n" "$f" ; done

or

ls -vQ | while IFS= read -r f ; do printf "'%s'\n" "$f" ; done

indicates that there's still something to do, and without
the 'ls -Q' quotes the (pathological) newlines are an issue
(at least).

I think it's a typical problem that would best be solved
by a shell built-in feature. (External tools may take you
part of the road and probably with additional effort for
common subsets of the task but probably not bullet-proof.)

Janis

Janis Papanagnou

2024-06-20 04:34:02 UTC

Post by Chris Elvidge

Post by Janis Papanagnou

[...]

I tried the approach I outlined above... (here just echo'ing the
created parts)...
N=${1:-1}
sed_a="[^0-9]*\([0-9]\+\)[^0-9]*"
sed_r="\1\t"
sort_a="-k1n"
for (( n=2; n<=N; n++ ))
do
sed_a+="\([0-9]\+\)[^0-9]*"
sed_r+="\\${n}\t"
sort_a+=" -k${n}n"
done
cut_a="-f$((N+1))-"
echo "# The following commands would be connected by pipes:"
echo "sed 's/${sed_a}/${sed_r}&/'"
echo "sort -t$'\t' ${sort_a}"
echo "cut ${cut_a}"

Your way is still restricted to filenames with a known number of sets of
digits, though (AFAICS). I.e. you pass N rather than finding it.

Yes. Above is just a codified version of the method I described
(thus also the echo's). Whether it's provided as parameter N or
obtained, say, from one of the files is left unanswered. Myself
I'd prefer some solution where even file sets with mixed amounts
of numerical parts may be used; thus being able to handle lists
that are named like chapters, like 1, 1.1, 1.2, ..., 5.3.3

Slowly and continuously approaching the goal... :-)

Janis

Post by Chris Elvidge
[...]

Chris Elvidge

2024-06-24 14:22:24 UTC

Post by Janis Papanagnou

Post by Chris Elvidge

Post by Janis Papanagnou

[...]

I tried the approach I outlined above... (here just echo'ing the
created parts)...
N=${1:-1}
sed_a="[^0-9]*\([0-9]\+\)[^0-9]*"
sed_r="\1\t"
sort_a="-k1n"
for (( n=2; n<=N; n++ ))
do
sed_a+="\([0-9]\+\)[^0-9]*"
sed_r+="\\${n}\t"
sort_a+=" -k${n}n"
done
cut_a="-f$((N+1))-"
echo "# The following commands would be connected by pipes:"
echo "sed 's/${sed_a}/${sed_r}&/'"
echo "sort -t$'\t' ${sort_a}"
echo "cut ${cut_a}"

Your way is still restricted to filenames with a known number of sets of
digits, though (AFAICS). I.e. you pass N rather than finding it.

Yes. Above is just a codified version of the method I described
(thus also the echo's). Whether it's provided as parameter N or
obtained, say, from one of the files is left unanswered. Myself
I'd prefer some solution where even file sets with mixed amounts
of numerical parts may be used; thus being able to handle lists
that are named like chapters, like 1, 1.1, 1.2, ..., 5.3.3
Slowly and continuously approaching the goal... :-)
Janis

Post by Chris Elvidge
[...]

Originally you said:
(Ideally I'd also like to handle names with two numbers "A35P56.txt"
and irregular string components (lowercase, say, "page310ch1.txt"),
but that's just a nice-to-have. - I might make use of 'sort'?)

Does 'sort -V' help?
Seems to work with both spaces and newlines.

--
Chris Elvidge, England
NO ONE CARES WHAT MY DEFINITION OF "IS" IS

Janis Papanagnou

2024-06-24 15:22:00 UTC

Post by Janis Papanagnou

[...]

(Ideally I'd also like to handle names with two numbers "A35P56.txt"
and irregular string components (lowercase, say, "page310ch1.txt"),
but that's just a nice-to-have. - I might make use of 'sort'?)
Does 'sort -V' help?
Seems to work with both spaces and newlines.

Yes, that would help like 'ls -v' does (presuming it behaves similar;
I haven't extensively tried 'sort -V' yet). (But using 'ls | sort -V'
is not that terse like 'ls -v'.) The problem with both external tools
is (as outlined upthread) the post-processing of the tool-generated
list of data in shell context. So both tools take some burden from
me (the sorting aspect is simply covered by an option), but doesn't
help me how I can safely post-process the identified items. (A shell
built-in could natively better, i.e. simpler and more consistently,
address that.)

Janis

45 Replies
11 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Janis Papanagnou 2024-06-14 07:31:18 UTC

Axel Reichert 2024-06-15 13:56:22 UTC

Janis Papanagnou 2024-06-15 14:43:50 UTC

Lawrence D'Oliveiro 2024-06-15 22:49:05 UTC

Janis Papanagnou 2024-06-16 03:11:29 UTC

Lawrence D'Oliveiro 2024-06-16 04:56:39 UTC

Janis Papanagnou 2024-06-16 09:48:25 UTC

Helmut Waitzmann 2024-06-16 19:52:58 UTC

Janis Papanagnou 2024-06-17 06:30:54 UTC

Geoff Clare 2024-06-18 12:32:06 UTC

Helmut Waitzmann 2024-06-19 00:22:59 UTC

Lawrence D'Oliveiro 2024-06-17 05:42:57 UTC

Janis Papanagnou 2024-06-17 06:32:21 UTC

Lawrence D'Oliveiro 2024-06-17 07:16:40 UTC

Janis Papanagnou 2024-06-17 07:44:55 UTC

Lawrence D'Oliveiro 2024-06-18 08:26:45 UTC

Kenny McCormack 2024-06-18 11:14:19 UTC

Janis Papanagnou 2024-06-18 14:27:33 UTC

Kaz Kylheku 2024-06-19 01:05:20 UTC

Eric Pozharski 2024-06-16 18:00:11 UTC

Janis Papanagnou 2024-06-17 06:21:55 UTC

Eric Pozharski 2024-06-18 14:54:58 UTC

Janis Papanagnou 2024-06-19 00:18:50 UTC

Chris Elvidge 2024-06-18 13:32:15 UTC

Janis Papanagnou 2024-06-18 14:38:30 UTC

Chris Elvidge 2024-06-18 15:19:45 UTC

Janis Papanagnou 2024-06-18 17:12:20 UTC

Janis Papanagnou 2024-06-18 17:23:33 UTC

Janis Papanagnou 2024-06-18 17:04:41 UTC

Chris Elvidge 2024-06-19 12:40:28 UTC

Janis Papanagnou 2024-06-19 13:11:27 UTC

Chris Elvidge 2024-06-19 15:06:37 UTC

vallor 2024-06-19 23:45:15 UTC

Janis Papanagnou 2024-06-20 04:43:13 UTC

Janis Papanagnou 2024-06-20 04:58:11 UTC

vallor 2024-06-20 22:16:42 UTC

Lawrence D'Oliveiro 2024-06-21 03:37:51 UTC

vallor 2024-06-21 04:20:47 UTC

Lawrence D'Oliveiro 2024-06-21 05:41:01 UTC

Kenny McCormack 2024-06-21 07:31:53 UTC

Chris Elvidge 2024-06-21 13:21:43 UTC

Kenny McCormack 2024-06-24 13:22:28 UTC

Janis Papanagnou 2024-06-24 14:01:03 UTC

Janis Papanagnou 2024-06-20 04:34:02 UTC

Chris Elvidge 2024-06-24 14:22:24 UTC

Janis Papanagnou 2024-06-24 15:22:00 UTC

about - legalese

Loading...