Discussion:
IFS=$'\n'
(too old to reply)
Lawrence D'Oliveiro
2024-08-13 08:26:41 UTC
Permalink
I like having spaces in file/directory names, but I avoid putting newlines
in them.

This plays havoc with the shell’s word-splitting rules, because the
default value for IFS is

IFS=$' \t\n'

which means names with spaces in them get split into separate items,
triggering lots of errors about items not found (or the wrong items found/
created).

However, if you change this to

IFS=$'\n'

then this can make things much more convenient (provided you can be sure
there are no newlines in your file names). For example, I can do

ls -lt $(find . -type f -iname \*fred\*)

to search for all filenames containing “fred” in the hierarchy rooted at
the current directory, and display them in reverse chronological order.
Helmut Waitzmann
2024-08-13 11:14:21 UTC
Permalink
Post by Lawrence D'Oliveiro
However, if you change this to
IFS=$'\n'
then this can make things much more convenient (provided you can
be sure there are no newlines in your file names). For example,
I can do
ls -lt $(find . -type f -iname \*fred\*)
to search for all filenames containing “fred” in the hierarchy
rooted at the current directory, and display them in reverse
chronological order.
Even if one is sure there are no linefeeds in the file names a
cautious design could do


(
IFS=$'\n' &&
ls -lt -- $(find . -name \*$'\n'\* ! -prune -o \
! -name \*$'\n'\* -iname \*fred\* -type f -print
)
)


That will work if there are really no linefeeds in file names. 
And if there inadvertently were any linefeeds in file names it
would at least not list file names that don't match the
“\*fred\*“ file name pattern.


But – according to the new POSIX standard
(<https://pubs.opengroup.org/onlinepubs/9799919799/utilities/find.html#top>
and
<https://pubs.opengroup.org/onlinepubs/9799919799/utilities/xargs.html#top>) –
one can use “xargs“ to get rid of any linefeed trouble at all:


find . -iname \*fred\* -type f -print0 |
xargs -0 -r -x -- ls -lt --


That will work with any file names, even those containing a
linefeed character.
Ralf Damaschke
2024-08-13 21:56:56 UTC
Permalink
Post by Helmut Waitzmann
But – according to the new POSIX standard
([links to 2024 opengroup specs for find and xargs}) –
find . -iname \*fred\* -type f -print0 |
xargs -0 -r -x -- ls -lt --
That will work with any file names, even those containing a linefeed
character.
OK, print0 is going to become standard, but nowadays I already prefer
(when I use iname for my comfort)

find . -iname \*fred\* -type f -exec ls -lt -- {} +

I don't see any advantage using print0 and xargs -0.
Lawrence D'Oliveiro
2024-08-13 22:13:10 UTC
Permalink
OK, print0 is going to become standard ...
Sure, but there needs to be a more convenient way to use NUL as a word
delimiter.
Christian Weisgerber
2024-08-14 13:59:39 UTC
Permalink
Post by Ralf Damaschke
Post by Helmut Waitzmann
find . -iname \*fred\* -type f -print0 |
xargs -0 -r -x -- ls -lt --
OK, print0 is going to become standard, but nowadays I already prefer
(when I use iname for my comfort)
find . -iname \*fred\* -type f -exec ls -lt -- {} +
If sufficiently many files accrue, find(1) will invoke ls(1) several
times, which will not produce the expected result. That may be
unlikely in this specific example, but it can happen in the general
case.

Wait, you say, xargs(1) will also split its input across multiple
invocations. I mean, that's very much the point of xargs. Which
is why Helmut added the -x flag, which is supposed to prevent this
behavior.

On BSD, that will be a syntax error because -x is only available
in combination with -n.
--
Christian "naddy" Weisgerber ***@mips.inka.de
Ralf Damaschke
2024-08-14 22:55:40 UTC
Permalink
Post by Christian Weisgerber
If sufficiently many files accrue, find(1) will invoke ls(1) several
times, which will not produce the expected result. That may be unlikely
in this specific example, but it can happen in the general case.
Wait, you say, xargs(1) will also split its input across multiple
invocations. I mean, that's very much the point of xargs. Which is why
Helmut added the -x flag, which is supposed to prevent this behavior.
I see the point, but I hope I never meet a use case that says
"do something with the files found, but throw the list away if it can't
be done all at once". I would rather first assemble the list, try to execute
the command with it and if needed switch to some different approach of
handling the files.
Ed Morton
2024-08-15 11:30:18 UTC
Permalink
Post by Ralf Damaschke
Post by Christian Weisgerber
If sufficiently many files accrue, find(1) will invoke ls(1) several
times, which will not produce the expected result. That may be unlikely
in this specific example, but it can happen in the general case.
Wait, you say, xargs(1) will also split its input across multiple
invocations. I mean, that's very much the point of xargs. Which is why
Helmut added the -x flag, which is supposed to prevent this behavior.
I see the point, but I hope I never meet a use case that says
"do something with the files found, but throw the list away if it can't
be done all at once". I would rather first assemble the list, try to execute
the command with it and if needed switch to some different approach of
handling the files.
Needing to process all of the files at once happens more often than you
might think, e.g. to merge CSVs we need to retain the header line from
just the first one read so the naive approach would be:

find . -type f -name '*.csv' -exec awk 'NR==1; FNR>1' {} +

but that would fail if `find` had to call awk for multiple batches of
files at a time as `NR==1` would then be true multiple times during the
execution of `find` and so the header lines from multiple files would be
printed. The solution is something like (untested):

awk -v RS='\0' '
NR == FNR { ARGV[ARGC++]=$0; next }
(FNR == 1) && !doneHdr++
FNR > 1
' < <(find . -type f -name '*.csv' -print0) RS='\n'

We have to read the output of `find` in `awk` to populate `ARGV[]`
instead of calling `awk` with the output of `find` as an argument list
because if that output is so long that `find` has to split it up in the
first script above, then it's also too long for `awk` to be passed as an
argument list. Having `-print0` is obviously useful in that situation.

Regards,

Ed.
Geoff Clare
2024-08-15 12:53:04 UTC
Permalink
Post by Christian Weisgerber
Post by Ralf Damaschke
find . -iname \*fred\* -type f -exec ls -lt -- {} +
If sufficiently many files accrue, find(1) will invoke ls(1) several
times, which will not produce the expected result. That may be
unlikely in this specific example, but it can happen in the general
case.
Wait, you say, xargs(1) will also split its input across multiple
invocations. I mean, that's very much the point of xargs. Which
is why Helmut added the -x flag, which is supposed to prevent this
behavior.
It isn't supposed to do that, and it doesn't.

$ echo 1234567890 1234567890 | xargs -s 50 echo
1234567890 1234567890
$ echo 1234567890 1234567890 | xargs -s 20 echo
1234567890
1234567890
$ echo 1234567890 1234567890 | xargs -x -s 20 echo
1234567890
1234567890

Tested with GNU, macOS (with -n 10 added), and Solaris versions of xargs.
--
Geoff Clare <***@gclare.org.uk>
Loading...